[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2002079473A2 - Molecules for diagnostics and therapeutics - Google Patents

Molecules for diagnostics and therapeutics Download PDF

Info

Publication number
WO2002079473A2
WO2002079473A2 PCT/US2002/001009 US0201009W WO02079473A2 WO 2002079473 A2 WO2002079473 A2 WO 2002079473A2 US 0201009 W US0201009 W US 0201009W WO 02079473 A2 WO02079473 A2 WO 02079473A2
Authority
WO
WIPO (PCT)
Prior art keywords
polynucleotide
proteins
protein
ceu
dna
Prior art date
Application number
PCT/US2002/001009
Other languages
French (fr)
Other versions
WO2002079473A3 (en
Inventor
Scott R. Panzer
Stephen E. Lincoln
Christina M. Altus
Gerard E. Dufour
Jennifer L. Jackson
Anissa L. Jones
Tam C. Dam
Tommy F. Liu
Bernard Harris
Vincent Flores
Abel Daffo
Rakesh Marwaha
Alice J. Chen
Simon C. Chang
Jr. Edward H Gerstin
Careyna H. Peralta
Marie H. David
Samantha A. Lewis
Original Assignee
Incyte Genomics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Incyte Genomics, Inc. filed Critical Incyte Genomics, Inc.
Priority to EP02733781A priority Critical patent/EP1366166A2/en
Priority to US10/250,889 priority patent/US20040115629A1/en
Priority to CA002434677A priority patent/CA2434677A1/en
Publication of WO2002079473A2 publication Critical patent/WO2002079473A2/en
Publication of WO2002079473A3 publication Critical patent/WO2002079473A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01KANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
    • A01K2217/00Genetically modified animals
    • A01K2217/05Animals comprising random inserted nucleic acids (transgenic)
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides

Definitions

  • the present invention relates to human molecules and to the use of these sequences in the diagnosis, study, prevention, and treatment of diseases associated with, as well as effects of exogenous compounds on, the expression of human molecules.
  • the human genome is comprised of thousands of genes, many encoding gene products that function in the maintenance and growth of the various cells and tissues in the body. Aberrant expression or mutations in these genes and their products is the cause of, or is associated with, a variety of human diseases such as cancer and other cell proliferative disorders, autoimmune/inflammatory disorders, infections, developmental disorders, endocrine disorders, 5 metabolic disorders, neurological disorders, gastrointestinal disorders, transport disorders, and connective tissue disorders.
  • the identification of these genes and their products is the basis of an ever-expanding effort to find markers for early detection of diseases, and targets for their prevention and treatment. Therefore, these genes and their products are useful as diagnostics and therapeutics.
  • genes may encode, for example, enzyme molecules, molecules associated with growth and o development, biochemical pathway molecules, extracellular information transmission molecules, receptor molecules, intracellular signaling molecules, membrane transport molecules, protein modification and maintenance molecules, nucleic acid synthesis and modification molecules, adhesion molecules, antigen recognition molecules, secreted and extracellular matrix molecules, cytoskeletal molecules, ribosomal molecules, electron transfer associated molecules, transcription factor molecules, 5 chromatin molecules, cell membrane molecules, and organelle associated molecules.
  • enzyme molecules molecules associated with growth and o development, biochemical pathway molecules, extracellular information transmission molecules, receptor molecules, intracellular signaling molecules, membrane transport molecules, protein modification and maintenance molecules, nucleic acid synthesis and modification molecules, adhesion molecules, antigen recognition molecules, secreted and extracellular matrix molecules, cytoskeletal molecules, ribosomal molecules, electron transfer associated molecules, transcription factor molecules, 5 chromatin molecules, cell membrane molecules, and organelle associated molecules.
  • cancer represents a type of cell proliferative disorder that affects nearly every tissue in the body.
  • a wide variety of molecules, either aberrantly expressed or mutated, can be the cause of, or involved with, various cancers because tissue growth involves complex and ordered patterns of cell proliferation, cell differentiation, and apoptosis.
  • Cell proliferation must be regulated to o maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of proteins which control cell cycle progression in response to extracellular signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or nutrient starvation.
  • Molecules which directly or indirectly modulate cell cycle progression fall into several categories, including growth factors and their receptors, second messenger and signal 5 transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors. Aberrant expression or mutations in any of these gene products can result in cell proliferative disorders such as cancer.
  • Oncogenes are genes generally derived from normal genes that, through abnormal expression or mutation, can effect the transformation of a normal cell to a malignant one (oncogenesis).
  • Oncoproteins, encoded by oncogenes can affect cell proliferation in a variety of ways 5 and include growth factors, growth factor receptors, intracellular signal transducers, nuclear transcription factors, and cell-cycle control proteins.
  • tumor-suppressor genes are involved in inhibiting cell proliferation. Mutations which cause reduced function or loss of function in tumor-suppressor genes result in aberrant cell proliferation and cancer. Although many different genes and their products have been found to be associated with cell proliferative disorders such as 0 cancer, many more may exist that are yet to be discovered.
  • DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that 5 predispose them to cancer. A cytochrome p450 gene array is useful to determine whether individuals have one of a number of specific mutations that could result in increased drug metabolism, drug resistance or drug toxicity.
  • DNA-based array technology is especially relevant for the rapid screening of expression of a large number of genes.
  • a genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, the expression of a large number of genes.
  • the interactions may be expected, such as when the genes are part of the same signaling pathway.
  • the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic 5 treatment affects the expression of a large number of genes.
  • the cellular processes of biogenesis and biodegradation involve a number of key enzyme classes including oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. These o enzyme classes are each comprised of numerous substrate-specific enzymes having precise and well regulated functions. These enzymes function by facilitating metabolic processes such as glycolysis, the tricarboxylic cycle, and fatty acid metabolism; synthesis or degradation of amino acids, steroids, phospholipids, alcohols, etc.; regulation of cell signalling, proliferation, inflamation, apoptosis, etc., and through catalyzing critical steps in DNA replication and repair, and the process of translation.
  • Oxidoreductases Many pathways of biogenesis and biodegradation require oxidoreductase (dehydrogenase or reductase) activity, coupled to the reduction or oxidation of a donor or acceptor cofactor.
  • Potential cofactors include cytochromes, oxygen, disulfide, iron-sulfur proteins, flavin adenine dinucleotide (FAD), and the nicotinamide adenine dinucleotides NAD and NADP (Newsholme, E.A. and A.R. Leech (1983) Biochemistry for the Medical Sciences, John Wiley and Sons, Chichester, U.K., pp. 779-793).
  • Reductase activity catalyzes the transfer of electrons between substrate(s) and cofactor(s) with concurrent oxidation of the cofactor.
  • the reverse dehydrogenase reaction catalyzes the reduction of a cofactor and consequent oxidation of the substrate.
  • Oxidoreductase enzymes are a broad superfamily of proteins that catalyze numerous reactions in all cells of organisms ranging from bacteria to plants to humans. These reactions include metabolism of sugar, certain detoxification reactions in the liver, and the synthesis or degradation of fatty acids, amino acids, glucocorticoids, estrogens, androgens, and prostaglandins.
  • oxidoreductases oxidases
  • reductases dehydrogenases
  • family members often have distinct cellular localizations, including the cytosol, the plasma membrane, mitochondrial inner or outer membrane, and peroxisomes.
  • Short-chain alcohol dehydrogenases are a family of dehydrogenases that only share 15% to 30% sequence identity, with similarity predominantly in the coenzyme binding domain and the substrate binding domain.
  • SCADs are also involved in synthesis and degradation of fatty acids, steroids, and some prostaglandins, and are therefore implicated in a variety of disorders such as lipid storage disease, myopathy, SCAD deficiency, and certain genetic disorders.
  • retinol dehydrogenase is a SCAD-family member (Simon, A. et al. (1995) J. Biol. Chem.
  • retinol dehydrogenase has been linked to hereditary eye diseases such as autosomal recessive childhood-onset severe retinal dystrophy (Simon, A. et al. (1996) Genomics 36:424-430).
  • Propagation of nerve impulses, modulation of cell proliferation and differentiation, induction of the immune response, and tissue homeostasis involve neurotransmitter metabolism (Weiss, B . ( 1991) Neurotoxicology 12:379-386; Collins, S.M. et al. (1992) Ann. N.Y. Acad. Sci. 664:415-424; Brown, J.K. and H. Imam (1991) J. Inherit. Metab. Dis. 14:436-458). Many pathways of neurotransmitter metabolism require oxidoreductase activity, coupled to reduction or oxidation of a cofactor, such as NAD + /NADH (Newsholme, E.A. and A.R. Leech (1983) Biochemistry for the Medical Sciences.
  • a cofactor such as NAD + /NADH
  • neurotransmitter degradation pathways that utilize NAD + N ADH-dependent oxidoreductase activity include those of L-DOPA (precursor of dopamine, a neuronal excitatory compound), glycine (an inhibitory neurotransmitter in the brain and spinal cord), histamine (liberated from mast cells during the inflammatory response), and taurine (an inhibitory neurotransmitter of the brain stem, spinal cord and retina) (Newsholme. supra, pp. 790, 792).
  • L-DOPA precursor of dopamine, a neuronal excitatory compound
  • glycine an inhibitory neurotransmitter in the brain and spinal cord
  • histamine liberated from mast cells during the inflammatory response
  • taurine an inhibitory neurotransmitter of the brain stem, spinal cord and retina
  • Tetrahydrofolate is a derivatized glutamate molecule that acts as a carrier, providing activated one-carbon units to a wide variety of biosyntlietic reactions, including synthesis of purines, pyrimidines, and the amino acid methionine. Tetrahydrofolate is generated by the activity of a holoenzyme complex called tetrahydrofolate synthase, which includes three enzyme activities: tetrahydrofolate dehydrogenase, tetrahydrofolate cyclohydrolase, and tetrahydrofolate synthetase.
  • 3-Hydroxyacyl-CoA dehydrogenase 3HACD is involved in fatty acid metabolism. It catalyzes the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA, with concomitant oxidation of NAD to NADH, in the mitochondria and peroxisomes of eukaryotic cells. In peroxisomes, 3HACD and enoyl-CoA hydratase form an enzyme complex called bifunctional enzyme, defects in which are associated with peroxisomal bifunctional enzyme deficiency.
  • a ⁇ amyloid- ⁇
  • APP amyloid precursor protein
  • 3HACD has been shown to bind the A ⁇ peptide, and is overexpressed in neurons affected in Alzheimer's disease.
  • an antibody against 3HACD can block the toxic effects of A ⁇ in a cell culture model of Alzheimer's disease (Yan, S. et al. (1997) Nature 389:689-695; OMJ , #602057).
  • Steroids such as estrogen, testosterone, corticosterone, and others, are generated from a common precursor, cholesterol, and are interconverted into one another.
  • a wide variety of enzymes act upon cholesterol, including a number of dehydrogenases.
  • Steroid dehydrogenases such as the hydroxysteroid dehydrogenases, are involved in hypertension, fertility, and cancer (Duax, W.L. and D. Ghosh (1997) Steroids 62:95-100).
  • One such dehydrogenase is 3-oxo-5- ⁇ -steroid dehydrogenase (OASD), a microsomal membrane protein highly expressed in prostate and other androgen-responsive tissues.
  • OASD 3-oxo-5- ⁇ -steroid dehydrogenase
  • OASD catalyzes the conversion of testosterone into dihydrotestosterone, which is the most potent androgen.
  • Dihydrotestosterone is essential for the formation of the male phenotype during embryogenesis, as well as for proper androgen-mediated growth of tissues such as the prostate and male genitalia.
  • a defect in OASD that prevents the conversion of testosterone into dihydrotestosterone leads to a rare form of male pseudohermaphroditis, characterized by defective formation of the external genitalia (Andersson, S. et al. (1991) Nature 354:159-161; Labrie, F. et al. (1992) Endocrinology 131:1571-1573; OMEV1 #264600).
  • OASD plays a central role in sexual differentiation and androgen physiology.
  • 17 ⁇ -hydroxysteroid dehydrogenase plays an important role in the regulation of the male reproductive hormone, dihydrotestosterone (DHTT).
  • 17 ⁇ HSD6 acts to reduce levels of DHTT by oxidizing a precursor of DHTT, 3 ⁇ -diol, to androsterone which is readily glucuronidated and removed from tissues.
  • 17 ⁇ HSD6 is active with both androgen and estrogen substrates when expressed in embryonic kidney 293 cells. At least five other isozymes of 17 ⁇ HSD have been identified that catalyze oxidation and/or reduction reactions in various tissues with preferences for different steroid substrates (Biswas, M.G. and D.W. Russell (1997) J. Biol. Chem.
  • 17 ⁇ HSDl preferentially reduces estradiol and is abundant in the ovary and placenta.
  • 17 ⁇ HSD2 catalyzes oxidation of androgens and is present in the endometrium and placenta.
  • 17 ⁇ HSD3 is exclusively a reductive enzyme in the testis (Geissler, W.M. et al. (1994) Nat. Genet. 7:34-39).
  • An excess of androgens such as DHTT can contribute to certain disease states such as benign prostatic hyperplasia and prostate cancer.
  • Oxidoreductases are components of the fatty acid metabolism pathways in mitochondria and peroxisomes.
  • the main beta-oxidation pathway degrades both saturated and unsaturated fatty acids, while the auxiliary pathway performs additional steps required for the degradation of unsaturated fatty acids.
  • the auxiliary beta-oxidation enzyme 2,4-dienoyl-CoA reductase catalyzes the removal of even- numbered double bonds from unsaturated fatty acids prior to their entry into the main beta-oxidation pathway.
  • the enzyme may also remove odd-numbered double bonds from unsaturated fatty acids (Koivuranta, K.T. et al. (1994) Biochem. J. 304:787-792; Smeland, T.E. et al. (1992) Proc. Natl. Acad. Sci. USA 89:6673-6677).
  • 2,4-dienoyl-CoA reductase is located in both mitochondria and peroxisomes. Inherited deficiencies in mitochondrial and peroxisomal beta-oxidation enzymes are associated with severe diseases, some of which manifest themselves soon after birth and lead to death within a few years. Defects in beta-oxidation are associated with Reye's syndrome, Zellweger syndrome, neonatal adrenoleukodystrophy, infantile Refsum's disease, acyl-CoA oxidase deficiency, and bifunctional protein deficiency (Suzuki, Y. et al. (1994) Am. J. Hum. Genet. 54:36-43; Hoefler, supra; Cotran, R.S. et al.
  • Peroxisomal beta-oxidation is impaired in cancerous tissue. Although 5 neoplastic human breast epithelial cells have the same number of peroxisomes as do normal cells, fatty acyl-CoA oxidase activity is lower than in control tissue (el Bouhtoury, F. et al. (1992) J. Pathol. 166:27-35). Human colon carcinomas have fewer peroxisomes than normal colon tissue and have lower fatty-acyl-CoA oxidase and bifunctional enzyme (including enoyl-CoA hydratase) activities than normal tissue (Cable, S. et al.
  • Isocitrate dehydrogenase Another important oxidoreductase is isocitrate dehydrogenase, which catalyzes the conversion of isocitrate to a-ketoglutarate, a substrate of the citric acid cycle.
  • Isocitrate dehydrogenase can be either NAD or NADP dependent, and is found in the cytosol, mitochondria, and peroxisomes. Activity of isocitrate dehydrogenase is regulated developmentally, and by hormones, neurotransmitters, and growth factors.
  • HPR Hydroxypyruvate reductase
  • a peroxisomal 2-hydroxyacid dehydrogenase in the glycolate pathway catalyzes the conversion of hydroxypyruvate to glycerate with the oxidation of both NADH and NADPH.
  • the reverse dehydrogenase reaction reduces NAD + and NADP + .
  • HPR recycles nucleotides and bases back into pathways leading to the synthesis of ATP and GTP. ATP and GTP are used to produce DNA and RNA and to control various aspects of signal transduction 0 and energy metabolism.
  • Inhibitors of purine nucleotide biosynthesis have long been employed as antiproliferative agents to treat cancer and viral diseases. HPR also regulates biochemical synthesis of serine and cellular serine levels available for protein synthesis.
  • the mitochondrial electron transport (or respiratory) chain is a series of oxidoreductase-type enzyme complexes in the mitochondrial membrane that is responsible for the transport of electrons 5 from NADH through a series of redox centers within these complexes to oxygen, and the coupling of this oxidation to the synthesis of ATP (oxidative phosphorylation). ATP then provides the primary source of energy for driving a cell's many energy-requiring reactions.
  • the key complexes in the respiratory chain are NADH:ubiquinone oxidoreductase (complex I), succinate:ubiquinone oxidoreductase (complex II), cytochrome c r b oxidoreductase (complex HI), cytochrome c oxidase o (complex IV), and ATP synthase (complex V) (Alberts, B. et al. (1994) Molecular Biology of the Cell, Garland Publishing, Inc., New York NY, pp. 677-678). All of these complexes are located on the inner matrix side of the mitochondrial membrane except complex JJ, which is on the cytosolic side.
  • Complex ⁇ transports electrons generated in the citric acid cycle to the respiratory chain.
  • the electrons generated by oxidation of succinate to fumarate in the citric acid cycle are transferred 5 through electron carriers in complex II to membrane bound ubiquinone (Q).
  • Q membrane bound ubiquinone
  • Transcriptional regulation of these nuclear-encoded genes appears to be the predominant means for controlling the biogenesis of respiratory enzymes. Defects and altered expression of enzymes in the respiratory chain are associated with a variety of disease conditions.
  • 3-hydroxyisobutyrate dehydrogenase important in valine catabolism, catalyzes the NAD-dependent oxidation of 3-hydroxyisobutyrate to methylmalonate semialdehyde within mitochondria. Elevated levels of 3-hydroxyisobutyrate have been reported in a number of disease states, including ketoacidosis, methylmalonic acidemia, and other disorders associated with deficiencies in methylmalonate semialdehyde dehydrogenase (Rougraf , P.M. et al. (1989) J. Biol. Chem. 264:5899-5903).
  • IVD isovaleryl-CoA-dehydrogenase
  • IVD is involved in leucine metabolism and catalyzes the oxidation of isovaleryl-CoA to 3-methylcrotonyl-CoA.
  • Human IVD is a tetrameric flavoprotein that is encoded in the nucleus and synthesized in the cytosol as a 45 kDa precursor with a mitochondrial import signal sequence.
  • a genetic deficiency caused by a mutation in the gene encoding IVD, results in the condition known as isovaleric acidemia. This mutation results in inefficient mitochondrial import and processing of the IVD precursor (Vockley, J. et al. (1992) J. Biol. Chem. 267:2494-2501). Transferases
  • Transferases are enzymes that catalyze the transfer of molecular groups. The reaction may involve an oxidation, reduction, or cleavage of covalent bonds, and is often specific to a substrate or to particular sites on a type of substrate. Transferases participate in reactions essential to such functions as synthesis and degradation of cell components, regulation of cell functions including cell signaling, cell proliferation, inflamation, apoptosis, secretion and excretion. Transferases are involved in key steps in disease processes involving these functions. Transferases are frequently classified according to the type of group transferred.
  • methyl transferases transfer one-carbon methyl groups
  • amino transferases transfer nitrogenous amino groups
  • similarly denominated enzymes transfer aldehyde or ketone, acyl, glycosyl, alkyl or aryl, isoprenyl, saccharyl, phosphorous-containing, sulfur- containing, or selenium-containing groups, as well as small enzymatic groups such as Coenzyme A.
  • Acyl transferases include peroxisomal carnitine octanoyl transferase, which is involved in the fatty acid beta-oxidation pathway, and mitochondrial carnitine palmitoyl transferases, involved in fatty acid metabolism and transport. Choline O-acetyl transferase catalyzes the biosynthesis of the neurotransmitter acetylcholine.
  • Amino transferases play key roles in protein synthesis and degradation, and they contribute to other processes as well.
  • the amino transferase 5-aminolevulinic acid synthase catalyzes the addition of succinyl-CoA to glycine, the first step in heme biosynthesis.
  • Other amino transferases participate in pathways important for neurological function and metabolism.
  • glutamine- phenylpyruvate amino transferase also known as glutamine transaminase K (GTK)
  • GTK glutamine transaminase K
  • GTK catalyzes the reversible conversion of L- glutamine and phenylpyruvate to 2-oxoglutaramate and L-phenylalanine.
  • Other amino acid substrates 5 for GTK include L-methionine, L-histidine, and L-tyrosine.
  • GTK also catalyzes the conversion of kynurenine to kynurenic acid, a tryptophan metabolite that is an antagonist of the N-methyl-D- aspartate (NMDA) receptor in the brain and may exert a neuromodulatory function. Alteration of the kynurenine metabolic pathway may be associated with several neurological disorders.
  • GTK also plays a role in the metabolism of halogenated xenobiotics conjugated to glutathione, leading to nephrotoxicity 0 in rats and neurotoxicity in humans.
  • GTK is expressed in kidney, liver, and brain.
  • Both human and rat GTKs contain a putative pyridoxal phosphate binding site (ExPASy ENZYME: EC 2.6.1.64; Perry, S.J. et al. (1993) Mol. Pharmacol. 43:660-665; Perry, S. et al. (1995) FEBS Lett. 360:277-280; and Alberati-Giani, D. et al. (1995) J. Neurochem. 64:1448-1455).
  • a second amino transferase associated with this pathway is kynurer ⁇ ie/ ⁇ -aminoadipate amino transferase (AadAT).
  • AadAT catalyzes the 5 reversible conversion of ⁇ -aminoadipate and ⁇ -ketoglutarate to ⁇ -ketoadipate and L-glutamate during lysine metabolism.
  • AadAT also catalyzes the transamination of kynurenine to kynurenic acid.
  • a cytosolic AadAT is expressed in rat kidney, liver, and brain (Nakatani, Y. et al. (1970) Biochim. Biophys. Acta 198:219-228; Buchli, R. et al. (1995) J. Biol. Chem. 270:29330-29335).
  • Glycosyl transferases include the mammalian UDP-glucouronosyl transferases, a family of o membrane-bound microsomal enzymes catalyzing the transfer of glucouronic acid to lipophilic substrates in reactions that play important roles in detoxification and excretion of drugs, carcinogens, and other foreign substances.
  • Another mammalian glycosyl transferase mammalian UDP-galactose- ceramide galactosyl transferase, catalyzes the transfer of galactose to ceramide in the synthesis of galactocerebrosides in myelin membranes of the nervous system.
  • the UDP-glycosyl transferases 5 share a conserved signature domain of about 50 amino acid residues (PROSITE: PDOC00359, http://expasy.hcuge.ch/sprot/prosite.html).
  • Methyl transferases are involved in a variety of pharmacologically important processes. Nicotinamide N-methyl transferase catalyzes the N-methylation of nicotinamides and other pyridines, an important step in the cellular handling of drugs and other foreign compounds. Phenylethanolamine o N-methyl transferase catalyzes the conversion of noradrenalin to adrenalin. 6-O-methylguanine-DNA methyl transferase reverses DNA methylation, an important step in carcinogenesis.
  • Uroporphyrin-JU C-methyl transferase which catalyzes the transfer of two methyl groups from S-adenosyl-L- methionine to uroporphyrinogen HI, is the first specific enzyme in the biosynthesis of cobalamin, a dietary enzyme whose uptake is deficient in pernicious anemia.
  • Protein-arginine methyl transferases 5 catalyze the posttranslational methylation of arginine residues in proteins, resulting in the mono- and dimethylation of arginine on the guanidino group.
  • Substrates include histones, myelin basic protein, and heterogeneous nuclear ribonucleoproteins involved in mRNA processing, splicing, and transport.
  • Protein-arginine methyl transferase interacts with proteins upregulated by mitogens, with proteins involved in chronic lymphocytic leukemia, and with interferon, suggesting an important role for 5 methylation in cytokine receptor signaling (Lin, W.-J. et al. (1996) J. Biol. Chem. 271:15034-15044; Abramovich, C. et al. (1997) EMBO J. 16:260-266; and Scott, H.S. et al. (1998) Genomics 48:330- 340).
  • Phosphotransferases catalyze the transfer of high-energy phosphate groups and are important in energy-requiring and -releasing reactions.
  • the metabolic enzyme creatine kinase catalyzes the o reversible phosphate transfer between creatine/creatine phosphate and ATP/ADP.
  • Glycocyamine kinase catalyzes phosphate transfer from ATP to guanidoacetate
  • arginine kinase catalyzes phosphate transfer from ATP to arginine.
  • a cysteine-containing active site is conserved in this family (PROSITE: PDOC00103).
  • Prenyl transferases are heterodimers, consisting of an alpha and a beta subunit, that catalyze 5 the transfer of an isoprenyl group.
  • An example of a prenyl transferase is the mammalian protein farnesyl transferase.
  • the alpha subunit of farnesyl transferase consists of 5 repeats of 34 amino acids each, with each repeat containing an invariant tryptophan (PROSITE: PDOC00703).
  • Saccharyl transferases are glycating enzymes involved in a variety of metabolic processes. Oligosacchryl transferase-48, for example, is a receptor for advanced glycation endproducts. o Accumulation of these endproducts is observed in vascular complications of diabetes, macrovascular disease, renal insufficiency, and Alzheimer's disease (Thornalley, PJ. (1998) Cell Mol. Biol. (Noisy- Le-Grand) 44:1013-1023).
  • Coenzyme A (Co A) transferase catalyzes the transfer of Co A between two carboxylic acids.
  • Succinyl CoA:3-oxoacid CoA transferase for example, transfers CoA from succinyl-CoA to a 5 recipient such as acetoacetate.
  • Acetoacetate is essential to the metabolism of ketone bodies, which accumulate in tissues affected by metabolic disorders such as diabetes (PROSITE: PDOC00980). Hydrolases
  • Hydrolysis is the breaking of a covalent bond in a substrate by introduction of a molecule of water.
  • the reaction involves a nucleophilic attack by the water molecule's oxygen atom on a target o bond in the substrate.
  • the water molecule is split across the target bond, breaking the bond and generating two product molecules.
  • Hydrolases participate in reactions essential to such functions as synthesis and degradation of cell components, and for regulation of cell functions including cell signaling, cell proliferation, inflamation, apoptosis, secretion and excretion. Hydrolases are involved in key steps in disease processes involving these functions.
  • Hydrolytic enzymes may be 5 grouped by substrate specificity into classes including phosphatases, peptidases, lysophospholipases, phosphodiesterases, glycosidases, and glyoxalases.
  • LPLs Lysophospholipases
  • LPLs 5 Lysophospholipases
  • Small LPL isoforms approximately 15-30 kD, function as hydrolases; larger isoforms function both as hydrolases and transacylases.
  • a particular substrate for LPLs, lysophosphatidylcholine, causes lysis of cell membranes.
  • LPL activity is regulated by signaling molecules important in numerous pathways, including the inflammatory o response.
  • Peptidases also called proteases, cleave peptide bonds that form the backbone of peptide or protein chains. Proteolytic processing is essential to cell growth, differentiation, remodeling, and homeostasis as well as inflammation and immune response. Since typical protein half-lives range from hours to a few days, peptidases are continually cleaving precursor proteins to their active form, 5 removing signal sequences from targeted proteins, and degrading aged or defective proteins.
  • Peptidases function in bacterial, parasitic, and viral invasion and replication within a host.
  • peptidases include trypsin and chymotrypsin (components of the complement cascade and the blood-clotting cascade) lysosomal cathepsins, calpains, pepsin, renin, and chymosin (Beynon, R. J. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach. Oxford University Press, New York 0 NY, pp. 1-5).
  • the phosphodiesterases catalyze the hydrolysis of one of the two ester bonds in a phosphodiester compound. Phosphodiesterases are therefore crucial to a variety of cellular processes. Phosphodiesterases include DNA and RNA endo- and exo-nucleases, which are essential to cell growth and replication as well as protein synthesis. Another phosphodiesterase is acid 5 sphingomyelinase, which hydrolyzes the membrane phospholipid sphingomyelin to ceramide and phosphorylcholine. Phosphorylcholine is used in the synthesis of phosphatidylcholine, which is involved in numerous intracellular signaling pathways.
  • Ceramide is an essential precursor for the generation of gangliosides, membrane lipids found in high concentration in neural tissue.
  • Defective acid sphingomyelinase phosphodiesterase leads to a build-up of sphingomyelin molecules in lysosomes, o resulting in Niemann-Pick disease.
  • Glycosidases catalyze the cleavage of hemiacetyl bonds of glycosides, which are compounds that contain one or more sugar.
  • Mammalian lactase-phlorizin hydrolase for example, is an intestinal enzyme that splits lactose.
  • Mammalian beta-galactosidase removes the terminal galactose from gangliosides, glycoproteins, and glycosaminoglycans, and deficiency of this enzyme is associated with 5 a gangliosidosis known as Morquio disease type B.
  • Vertebrate lysosomal alpha-glucosidase which hydrolyzes glycogen, maltose, and isomaltose
  • vertebrate intestinal sucrase-isomaltase which hydrolyzes sucrose, maltose, and isomaltose
  • the glyoxylase system is involved in gluconeogenesis, the production of glucose from storage 5 compounds in the body. It consists of glyoxylase I, which catalyzes the formation of S-D- lactoylglutathione from methyglyoxal, a side product of triose-phosphate energy metabolism, and glyoxylase ⁇ , which hydrolyzes S-D-lactoylglutathione to D-lactic acid and reduced glutathione. Glyoxylases are involved in hyperglycemia, non-insulin-dependent diabetes mellitus, the detoxification of bacterial toxins, and in the control of cell proliferation and microtubule assembly. 0 Lyases
  • Lyases are a class of enzymes that catalyze the cleavage of C-C, C-O, C-N, C-S, C-(halide), P-O or other bonds without hydrolysis or oxidation to form two molecules, at least one of which contains a double bond (Stryer, L. (1995) Biochemistry W.H. Freeman and Co. New York, NY p.620). Lyases are critical components of cellular biochemistry with roles in metabolic energy 5 production including fatty acid metabolism, as well as other diverse enzymatic processes. Further classification of lyases reflects the type of bond cleaved as well as the nature of the cleaved group.
  • the group of C-C lyases include carboxyl-lyases (decarboxylases), aldehyde-lyases (aldolases), oxo-acid-lyases and others.
  • the C-O lyase group includes hydro-lyases, lyases acting on polysaccharides and other lyases.
  • the C-N lyase group includes ammonia-lyases, amidine-lyases, 0 amine-lyases (deaminases) and other lyases.
  • lyases Proper regulation of lyases is critical to normal physiology.
  • mutation induced deficiencies in the uroporphyrinogen decarboxylase can lead to photosensitive cutaneous lesions in the • genetically-linked disorder familial porphyria cutanea tarda (Mendez, M. et al. (1998) Am. J. Genet. 63:1363-1375).
  • adenosine deaminase (ADA) deficiency stems from 5 genetic mutations in the ADA gene, resulting in the disorder severe combined immunodeficiency disease (SCID) (Hershfield, M.S. (1998) Semin. Hematol. 35:291-298).
  • SCID severe combined immunodeficiency disease
  • Isomerases are a class of enzymes that catalyze geometric or structural changes within a molecule to form a single product. This class includes racemases and epimerases, cis-trans- o isomerases, intramolecular oxidoreductases, intramolecular transferases (mutases) and intramolecular lyases. Isomerases are critical components of cellular biochemistry with roles in metabolic energy production including glycolysis, as well as other diverse enzymatic processes (Stryer, L. (1995) Biochemistry, W.H. Freeman and Co., New York NY, pp.483-507).
  • Racemases are a subset of isomerases that catalyze inversion of a molecules configuration 5 around the asymmetric carbon atom in a substrate having a single center of asymmetry, thereby interconverting two racemers.
  • Epimerases are another subset of isomerases that catalyze inversion of configuration around an asymmetric carbon atom in a substrate with more than one center of symmetry, thereby interconverting two epimers. Racemases and epimerases can act on amino acids and derivatives, hydroxy acids and derivatives, as well as carbohydrates and derivatives.
  • the interconversion of UDP-galactose and UDP-glucose is catalyzed by UDP-galactose-4'-epimerase.
  • Oxidoreductases can be isomerases as well. Oxidoreductases catalyze the reversible transfer of electrons from a substrate that becomes oxidized to a substrate that becomes reduced. This class of enzymes includes dehydrogenases, hydroxylases, oxidases, oxygenases, peroxidases, and reductases.
  • oxidoreductase levels Proper maintenance of oxidoreductase levels is physiologically important.
  • genetically-linked deficiencies in lipoamide dehydrogenase can result in lactic acidosis (Robinson, B.H. et al. (1977) Pediat. Res. 11:1198-1202).
  • Another subgroup of isomerases are the transferases (or mutases). Transferases transfer a chemical group from one compound (the donor) to another compound (the acceptor).
  • the types of groups transferred by these enzymes include acyl groups, amino groups, phosphate groups (phosphotransferases or phosphomutases), and others.
  • the transferase carnitine palmitoyltransferase is an important component of fatty acid metabolism.
  • Topoisomerases are enzymes that affect the topological state of DNA. For example, defects in topoisomerases or their regulation can affect normal physiology. Reduced levels of topoisomerase II have been correlated with some of the DNA processing defects associated with the disorder ataxia-telangiectasia (Singh, S.P. et al. (1988) Nucleic Acids Res. 16:3919-3929). Ligases
  • Ligases catalyze the formation of a bond between two substrate molecules. The process involves the hydrolysis of a pyrophosphate bond in ATP or a similar energy donor. Ligases are classified based on the nature of the type of bond they form, which can include carbon-oxygen, carbon-sulfur, carbon-nitrogen, carbon-carbon and phosphoric ester bonds.
  • Ligases forming carbon-oxygen bonds include the aminoacyl-transfer RNA (fRNA) synthetases which are important RNA-associated enzymes with roles in translation. Protein biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA. The aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of an amino acid with its cognate tRNA.
  • the 20 aminoacyl-tRNA synthetase enzymes can be divided into two structural classes, and each class is characterized by a distinctive topology of the catalytic domain. Class I enzymes contain a catalytic domain based on the nucleotide-binding Rossman fold.
  • Class II enzymes contain a central catalytic domain, which consists of a seven-stranded antiparallel ⁇ -sheet motif, as well as N- and C- terminal regulatory domains. Class II enzymes are separated into two groups based on the heterodimeric or homodimeric structure of the enzyme; the latter group is further subdivided by the structure of the N- and C-terminal regulatory domains (Hartlein, M. and S. Cusack (1995) J. Mol. Evol. 40:519-530). Autoantibodies against aminoacyl-tRNAs are generated by patients with dermatomyositis and polymyositis, and correlate strongly with complicating interstitial lung disease (ELD). These antibodies appear to be generated in response to viral infection, and coxsackie virus has been used to induce experimental viral myositis in animals.
  • ELD interstitial lung disease
  • Ligases forming carbon-sulfur bonds mediate a large number of cellular biosynthetic intermediary metabolism processes involve intermolecular transfer of carbon atom-containing substrates (carbon substrates). Examples of such reactions include the tricarboxylic acid cycle, synthesis of fatty acids and long-chain phosphoHpids, synthesis of alcohols and aldehydes, synthesis of intermediary metabolites, and reactions involved in the amino acid degradation pathways. Some of these reactions require input of energy, usually in the form of conversion of ATP to either ADP or AMP and pyrophosphate.
  • a carbon substrate is derived from a small molecule containing at least two carbon atoms.
  • the carbon substrate is often covalently bound to a larger molecule which acts as a carbon substrate carrier molecule within the cell.
  • the carrier molecule is coenzyme A.
  • Coenzyme A is structurally related to derivatives of the nucleotide ADP and consists of 4'-phosphopantetheine linked via a phosphodiester bond to the alpha phosphate group of adenosine 3',5'-bisphosphate. The terminal thiol group of 4'-phosphopantetheine acts as the site for carbon substrate bond formation.
  • the predominant carbon substrates which utilize CoA as a carrier molecule during biosynthesis and intermediary metabolism in the cell are acetyl, succinyl, and propionyl moieties, collectively referred to as acyl groups.
  • Other carbon substrates include enoyl lipid, which acts as a fatty acid oxidation intermediate, and carnitine, which acts as an acetyl-CoA flux regulator/ mitochondrial acyl group transfer protein.
  • Acyl-CoA and acetyl-CoA are synthesized in the cell by acyl-CoA synthetase and acetyl-CoA synthetase, respectively.
  • acyl-CoA synthetase activity i) acetyl-CoA synthetase, which activates acetate and several other low molecular weight -carboxylic acids and is found in muscle mitochondria and the cytosol of other tissues; ii) medium-chain acyl-CoA synthetase, which activates fatty acids containing between four and eleven carbon atoms (predominantly from dietary sources), and is present only in liver mitochondria; and iii) acyl CoA synthetase, whch s spec fic for long c an atty ac ds w th between six and twenty carbon atoms, and is found in microsomes and the mitochondria.
  • acyl-CoA synthetase activity has been identified from many sources including bacteria, yeast, plants, mouse, and man.
  • the activity of acyl-CoA synthetase may be modulated by phosphorylation of the enzyme by cAMP-dependent protein kinase.
  • Ligases forming carbon-nitrogen bonds include amide synthases such as glutamine synthetase (glutamate-ammonia ligase) that catalyzes the amination of glutamic acid to glutamine by ammonia using the energy of ATP hydrolysis.
  • glutamine synthetase glutamine synthetase
  • Glutamine is the primary source for the amino group in various amide transfer reactions involved in de novo pyrimidine nucleotide synthesis and in purine and pyrimidine ribonucleotide interconversions.
  • Overexpression of glutamine synthetase has been observed in primary liver cancer (Christa, L. et al. (1994) Gastroent. 106:1312-1320).
  • Acid-amino-acid ligases are represented by the ubiquitin proteases which are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of cellular proteins in eukaryotic cells and some bacteria.
  • UCS ubiquitin conjugation system
  • the UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression.
  • proteins targeted for degradation are conjugated to a ubiquitin (Ub), a small heat stable protein.
  • Ub is first activated by a ubiquitin-activating enzyme (El), and then transferred to one of several Ub-conjugating enzymes (E2).
  • E2 then links the Ub molecule through its C-terminal glycine to an internal lysine (acceptor lysine) of a target protein.
  • the ubiquitinated protein is then recognized and degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease.
  • the UCS is implicated in the degradation of mitotic cyclic kinases, oncoproteins, tumor suppressor genes such as p53, viral proteins, cell surface receptors associated with signal transduction, transcriptional regulators, and mutated or damaged proteins (Ciechanover, A. (1994) Cell 79:13-21).
  • a murine proto-oncogene, Unp encodes a nuclear ubiquitin protease whose overexpression leads to oncogenic transformation of NTH3T3 cells, and the human homolog of this gene is consistently elevated in small cell tumors and adenocarcinomas of the lung (Gray, D.A. (1995) Oncogene 10:2179-2183).
  • Cyclo-ligases and other carbon-nitrogen ligases comprise various enzymes and enzyme complexes that participate in the de novo pathways to purine and pyrimidine biosynthesis. Because these pathways are critical to the synthesis of nucleotides for replication of both RNA and DNA, many of these enzymes have been the targets of clinical agents for the treatment of cell proliferative disorders such as cancer and infectious diseases.
  • Purine biosynthesis occurs de novo from the amino acids glycine and glutamine, and other small molecules.
  • Three of the key reactions in this process are catalyzed by a trifunctional enzyme composed of glyc nami e-r onuc eot e syn etase , am noimi azo e r onuc eot e synt etase
  • AJRS glycinamide ribonucleotide transformylase
  • GART glycinamide ribonucleotide transformylase
  • Adenylosuccinate synthetase catalyzes a later step in purine biosynthesis that converts inosinic acid to adenylosuccinate, a key step on the path to ATP synthesis.
  • This enzyme is also similar to another carbon-nitrogen ligase, argininosuccinate synthetase, that catalyzes a similar reaction in the urea cycle (Powell, S.M. et al. (1992) FEBS Lett. 303:4-10).
  • de novo synthesis of the pyrimidine nucleotides uridylate and cytidylate also arises from a common precursor, in this instance the nucleotide orotidylate derived from orotate and phosphoribosyl pyrophosphate (PPRP).
  • PPRP phosphoribosyl pyrophosphate
  • ATCase aspartate transcarbamylase
  • carbamyl phosphate synthetase II carbamyl phosphate synthetase II
  • DHOase dihydroorotase 5
  • Ligases forming carbon-carbon bonds include the carboxylases acetyl-CoA carboxylase and pyruvate carboxylase.
  • Acetyl-CoA carboxylase catalyzes the carboxylation of acetyl-CoA from CO 2 5 and ILO using the energy of ATP hydrolysis.
  • Acetyl-CoA carboxylase is the rate-limiting step in the biogenesis of long-chain fatty acids.
  • Two isoforms of acetyl-CoA carboxylase, types I and types ⁇ are expressed in human in a tissue-specific manner (Ha, J. et al. (1994) Eur. J. Biochem. 219:297- 306).
  • Pyruvate carboxylase is a nuclear-encoded mitochondrial enzyme that catalyzes the conversion of pyruvate to oxaloacetate, a key intermediate in the citric acid cycle.
  • o Ligases forming phosphoric ester bonds include the DNA ligases involved in both DNA replication and repair. DNA ligases seal phosphodiester bonds between two adjacent nucleotides in a DNA chain using the energy from ATP hydrolysis to first activate the free 5 -phosphate of one nucleotide and then react it with the 3 -OH group of the adjacent nucleotide.
  • This resealing reaction is used in both DNA replication to join small DNA fragments called Okazaki fragments that are 5 transiently formed in the process of replicating new DNA, and in DNA repair.
  • DNA repair is the process by which accidental base changes, such as those produced by oxidative damage, hydrolytic attack, or uncontrolled methylation of DNA, are corrected before replication or transcription of the DNA can occur.
  • Bloom's syndrome is an inherited human disease in which individuals are partially deficient in DNA ligation and consequently have an increased incidence of cancer (Alberts, B. et al. (1994) The Molecular Biology of the Cell, Garland Publishing Inc. , New York NY, p. 247).
  • Cell division is the fundamental process by which all living things grow and reproduce. In unicellular organisms such as yeast and bacteria, each cell division doubles the number of organisms, while in multicellular species many rounds of cell division are required to replace cells lost by wear or by programmed cell death, and for cell differentiation to produce a new tissue or organ. Details of the cell division cycle may vary, but the basic process consists of three principle events. The first event, interphase, involves preparations for cell division, replication of the DNA, and production of essential proteins. In the second event, mitosis, the nuclear material is divided and separates to opposite sides of the cell. The final event, cytokinesis, is division and fission of the cell cytoplasm. The sequence and timing of cell cycle transitions is under the control of the cell cycle regulation system which controls the process by positive or negative regulatory circuits at various check points.
  • Regulated progression of the cell cycle depends on the integration of growth control pathways with the basic cell cycle machinery.
  • Cell cycle regulators have been identified by selecting for human and yeast cDNAs that block or activate cell cycle arrest signals in the yeast mating pheromone pathway when hey are overexpressed.
  • Known regulators include human CPR (cell cycle progression restoration) genes, such as CPR8 and CPR2, and yeast CDC (cell division control) genes, including CDC91 , that block the arrest signals .
  • the CPR genes express a variety of proteins including cyclins , tumor suppressor binding proteins, chaperones, transcription factors, translation factors, and RNA-binding proteins (Edwards, M.C. et al.(1997) Genetics 147:1063-1076).
  • Cdks cyclin-dependent kinases
  • the Cdks are composed of a kinase subunit, Cdk, and an activating subunit, cyclin, in a complex that is subject to many levels of regulation.
  • Cdks There appears to be a single Cdk in Saccharomyces cerevisiae and Saccharomyces pombe whereas mammals have a variety of specialized Cdks.
  • Cyclins act by binding to and activating cyclin-dependent protein kinases which then phosphorylate and activate selected proteins involved in the mitotic process.
  • the Cdk-cyclin complex is both positively and negatively regulated by phosphorylation, and by targeted degradation involving molecules such as CDC4 and CDC53.
  • Cdks are further regulated by binding to inhibitors and other proteins such as Sucl that modify their specificity or accessibility to regulators (Patra, D. and W.G. Dunphy (1996) Genes Dev. 10:1503-1515; and Mathias, N. et al. (1996) Mol. Cell Biol. 16:6634-6643).
  • the male and female reproductive systems are complex and involve many aspects of growth and development.
  • the anatomy and physiology of the male and female reproductive systems are reviewed in (Guyton, A.C. (1991) Textbook of Medical Physiology, W.B. Saunders Co., Philadelphia PA, pp. 899-928).
  • the male reproductive system includes the process of spermatogenesis, in which the sperm are formed, and male reproductive functions are regulated by various hormones and their effects on accessory sexual organs, cellular metabolism, growth, and other bodily functions.
  • Spermatogenesis begins at puberty as a result of stimulation by gonadotropic hormones released from the anterior pituitary. Immature sperm (spermatogonia) undergo several mitotic cell divisions before undergoing meiosis and full maturation. The testes secrete several male sex hormones, the most abundant being testosterone, that is essential for growth and division of the immature sperm, and for the masculine characteristics of the male body. Three other male sex hormones, gonadotropin-releasing hormone (GnRH), luteinizing hormone (LH), and folHcle-stimulating hormone (FSH) control sexual function.
  • GnRH gonadotropin-releasing hormone
  • LH luteinizing hormone
  • FSH folHcle-stimulating hormone
  • the uterus, ovaries, fallopian tubes, vagina, and breasts comprise the female reproductive system.
  • the ovaries and uterus are the source of ova and the location of fetal development, respectively.
  • the fallopian tubes and vagina are accessory organs attached to the top and bottom of the uterus, respectively. Both the uterus and ovaries have additional roles in the development and loss of reproductive capability during a female' s lifetime.
  • the primary role of the breasts is lactation.
  • endocrine signals from the ovaries, uterus, pituitary, hypothalamus, adrenal glands, and other tissues coordinate reproduction and lactation. These signals vary during the monthly menstruation cycle and during the female's lifetime. Similarly, the sensitivity of reproductive organs to these endocrine signals varies during the female's lifetime.
  • a combination of positive and negative feedback to the ovaries, pituitary and hypothalamus glands controls physiologic changes during the monthly ovulation and endometrial cycles.
  • the anterior pituitary secretes two major gonadotropin hormones, follicie-stimulating hormone (FSH) and luteinizing hormone (LH), regulated by negative feedback of steroids, most notably by ovarian estradiol. If fertilization does not occur, estrogen and progesterone levels decrease. This sudden reduction of the ovarian hormones leads to menstruation, the desquamation of the endometrium.
  • FSH follicie-stimulating hormone
  • LH lutein
  • Hormones further govern all the steps of pregnancy, parturition, lactation, and menopause.
  • hCG human chorionic gonadotropin
  • estrogens progesterone
  • hCS human chorionic somatomammotropin
  • hCG a glycoprotein similar to luteinizing hormone, stimulates the corpus luteum to continue producing more progesterone and estrogens, rather than to involute as occurs if the ovum is not fertilized.
  • hCS is similar to growth hormone and is crucial for fetal nutrition.
  • the female breast also matures during pregnancy.
  • Large amounts of estrogen secreted by the placenta trigger growth and branching of the breast milk ductal system while lactation is initiated by the secretion of prolactin by the pituitary gland.
  • Parturition involves several hormonal changes that increase uterine contractility toward the end of pregnancy, as follows.
  • the levels of estrogens increase more than those of progesterone.
  • Oxytocin is secreted by the neurohypophysis. Concomitantly, uterine sensitivity to oxytocin increases.
  • the fetus itself secretes oxytocin, cortisol (from adrenal glands), and prostaglandins.
  • Menopause occurs when most of the ovarian follicles have degenerated.
  • the ovary then produces less estradiol, reducing the negative feedback on the pituitary and hypothalamus glands.
  • Mean levels of circulating FSH and LH increase, even as ovulatory cycles continue. Therefore, the ovary is less responsive to gonadotropins, and there is an increase in the time between menstrual cycles. Consequently, menstrual bleeding ceases and reproductive capability ends.
  • Cell Differentiation and Proliferation Tissue growth involves complex and ordered patterns of cell proliferation, cell differentiation, and apoptosis. Cell proliferation must be regulated to maintain both the number of cells and their spatial organization.
  • This regulation depends upon the appropriate expression of proteins which control cell cycle progression in response to extracellular signals, such as growth factors and other mitogens, and intracellular cues, such as DNA damage or nutrient starvation.
  • Molecules which directly or indirectly modulate cell cycle progression fall into several categories, including growth factors and their receptors, second messenger and signal transduction proteins, oncogene products, tumor- suppressor proteins, and mitosis-promoting factors.
  • Growth factors were originally described as serum factors required to promote cell proliferation. Most growth factors are large, secreted polypeptides that act on cells in their local environment. Growth factors bind to and activate specific cell surface receptors and initiate mtracellular signal transduction cascades. Many growth factor receptors are classified as receptor tyrosine kinases which undergo autophosphorylation upon ligand binding. Autophosphorylation enables the receptor to interact with signal transduction proteins characterized by the presence of SH2 or SH3 domains (Src homology regions 2 or 3).
  • G- proteins such as Ras, Rab, and Rho
  • GAPs GTPase activating proteins
  • GNRPs guanine nucleotide releasing proteins
  • Small G proteins act as molecular switches that activate other downstream events, such as mitogen-activated protein kinase (MAP kinase) cascades.
  • MAP kinases ultimately activate transcription of mitosis- promoting genes.
  • small signaling peptides and hormones also influence cell proliferation.
  • GPCR trimeric G-protein coupled receptor
  • the GPCR Upon ligand binding, the GPCR activates a trimeric G protein which in turn triggers increased levels of intracellular second messengers such as phospholipase C, Ca2+, and cyclic AMP.
  • Most GPCR-mediated signaling pathways indirectly promote cell proliferation by causing the secretion or breakdown of other signaling molecules that have direct mitogenic effects. These signaling cascades often involve activation of kinaSes and phosphatases.
  • Some growth factors such as some members of the fransforming growth factor beta (TGF- ⁇ ) family, act on some cells to stimulate cell proliferation and on other cells to inhibit it. Growth factors may also stimulate a cell at one concentration and inhibit the same cell at another concentration.
  • TNF/NGF tumor necrosis factor/nerve growth factor
  • TNF/NGF tumor necrosis factor/nerve growth factor
  • the cell response depends on the type of cell, its stage of differentiation and transformation status, which surface receptors are stimulated, and the types of stimuli acting on the cell (Smith, A. et al. (1994) Cell 76:959-962; and Nocentini, G. et al. (1997) Proc. Natl. Acad. Sci. USA 94:6216-6221).
  • ECM extracellular matrix
  • ECM molecules such as laminin or fibronectin
  • enasc n- an - expresse n eve op ng an es one neura ssue, prov e stimulatory/anti-adhesive or inhibitory properties, respectively, for axonal growth (Faissner, A. (1997) Cell Tissue Res. 290:331-341).
  • Cancers are associated with the activation of oncogenes which are derived from normal cellular genes. These oncogenes encode oncoproteins which convert normal cells into malignant cells. Some oncoproteins are mutant isoforms of the normal protein, and other oncoproteins are abnormally expressed with respect to location or amount of expression. The latter category of oncoprotein causes cancer by altering transcriptional control of cell proliferation.
  • Five classes of oncoproteins are known to affect cell cycle controls. These classes include growth factors, growth factor receptors, intracellular signal transducers, nuclear transcription factors, and cell-cycle control proteins.
  • Viral oncogenes are integrated into the human genome after infection of human cells by certain viruses. Examples of viral oncogenes include v-src, v-abl, and v-fps.
  • oncogenes have been identified and characterized. These include sis, erbA, erbB, her- 2, mutated G s , src, abl, ras, crk, jun, fos, myc, and mutated tumor-suppressor genes such as RB, p53, mdm2, Cipl, pl6, and cyclin D. Transformation of normal genes to oncogenes may also occur by chromosomal translocation.
  • the Philadelphia chromosome characteristic of chronic myeloid leukemia and a subset of acute lymphoblastic leukemias, results from a reciprocal translocation between chromosomes 9 and 22 that moves a truncated portion of the proto-oncogene c-abl to the breakpoint cluster region (bcr) on chromosome 22.
  • Tumor-suppressor genes are involved in regulating cell proliferation. Mutations which cause reduced or loss of function in tumor-suppressor genes result in uncontrolled cell proliferation.
  • the retinoblastoma gene product (RB) in a non-phosphorylated state, binds several early- response genes and suppresses their transcription, thus blocking cell division. Phosphorylation of RB causes it to dissociate from the genes, releasing the suppression, and allowing cell division to proceed. Apoptosis
  • Apoptosis is the genetically controlled process by which unneeded or defective cells undergo programmed cell death. Selective elimination of cells is as important for morphogenesis and tissue remodeling as is cell proliferation and differentiation. Lack of apoptosis may result in hyperplasia and other disorders associated with increased cell proliferation. Apoptosis is also a critical component of the immune response. Immune cells such as cytotoxic T-cells and natural killer cells prevent the spread of disease by inducing apoptosis in tumor cells and virus-infected cells. In addition, immune cells that fail to distinguish self molecules from foreign molecules must be eliminated by apoptosis to avoid an autoimmune response.
  • apoptosis includes cell shrinkage, nuclear and cytoplasmic condensation, and alterations in plasma membrane topology. Biochemically, apoptotic cells are characterized by increased intracellular calcium concentration, fragmentation of chromosomal DNA, and expression of novel cell surface components.
  • Apoptosis generally proceeds in response 5 to a signal which is transduced intracellularly and results in altered patterns of gene expression and protein activity.
  • Signaling molecules such as hormones and cytokines are known both to stimulate and to inhibit apoptosis through interactions with cell surface receptors. Transcription factors also play an important role in the onset of apoptosis.
  • a number of downstream effector molecules, particularly proteases such as the cysteine proteases called caspases have been implicated in the degradation of 0 cellular components and the proteolytic activation of other apoptotic effectors.
  • Biochemical pathways are responsible for regulating metabolism, growth and development, protein secretion and trafficking, environmental responses, and ecological interactions including 5 immune response and response to parasites.
  • DNA Deoxyribonucleic acid
  • DNA the genetic material
  • the bulk of human DNA is nuclear, in the form of linear chromosomes, while mitochondrial DNA is circular.
  • DNA replication begins at specific sites called origins of o replication. Bidirectional synthesis occurs from the origin via two growing forks that move in opposite directions. Replication is semi-conservative, with each daughter duplex containing one old strand and its newly synthesized complementary partner.
  • Proteins involved in DNA replication include DNA polymerases, DNA primase, telomerase, DNA helicase, topoisomerases, DNA ligases, replication factors, and DNA-binding proteins.
  • DNA Recombination and Repair Cells are constantly faced w th repl ca on errors and environmental assault (such as ultraviolet irradiation) that can produce DNA damage.
  • Damage to DNA consists of any change that modifies the structure of the molecule. Changes to DNA can be divided into two general classes, single base changes and structural distortions. Any damage to DNA can produce a mutation, and the mutation may produce a disorder, such as cancer.
  • Repair systems can be divided into three general types, direct repair, excision repair, and retrieval systems. Proteins involved in DNA repair include DNA polymerase, excision repair proteins, excision and cross link repair proteins, recombination and repair proteins, RAD51 proteins, and BLN and WRN proteins that are homologs of RecQ helicase. When the repair systems are eliminated, cells become exceedingly sensitive to environmental mutagens, such as ultraviolet irradiation. Patients with disorders associated with a loss in DNA repair systems often exhibit a high sensitivity to environmental mutagens.
  • XP xeroderma pigmentosum
  • BS Bloom's syndrome
  • WS Werner's syndrome
  • Recombination is the process whereby new DNA sequences are generated by the movements of large pieces of DNA.
  • homologous recombination which occurs during meiosis and DNA repair, parent DNA duplexes align at regions of sequence similarity, and new DNA molecules form by the breakage and joining of homologous segments.
  • Proteins involved include RAD51 recombinase.
  • site-specific recombination two specific but not necessarily homologous DNA sequences are exchanged.
  • this process generates a diverse collection of antibody and T cell receptor genes.
  • Proteins involved in site-specific recombination in the immune system include recombination activating genes 1 and 2 (RAG1 and RAG2).
  • a defect in immune system site-specific recombination causes severe combined immunodeficiency disease in mice.
  • RNA Ribonucleic acid
  • RNA Ribonucleic acid
  • ATP ATP
  • CTP CTP
  • UTP UTP
  • GTP GTP
  • RNA Ribonucleic acid
  • RNA Ribonucleic acid
  • ATP ATP
  • CTP CTP
  • UTP UTP
  • GTP GTP
  • RNA Ribonucleic acid
  • RNA is transcribed as a copy of DNA, the genetic material of the organism.
  • DNA serves as the genetic material.
  • RNA copies of the genetic material encode proteins or serve various structural, catalytic, or regulatory roles in organisms.
  • RNA is classified according to its cellular localization and function.
  • Messenger RNAs (mRNAs) encode polypeptides.
  • Ribosomal RNAs (rRNAs) are assembled, along with ribosomal proteins, into ribosomes, which are cytoplasmic particles that translate mRNA into polypeptides.
  • Transfer RNAs are cytosolic adaptor molecules that function in mRNA translation by recognizing both an mRNA codon and the amino acid that matches that codon.
  • Heterogeneous nuclear RNAs include mRNA precursors and other nuclear RNAs of various sizes.
  • Small nuclear RNAs are a part of the nuclear spliceosome complex that removes intervening, non-coding sequences (introns) and rejoins exons in pre-mRNAs.
  • RNA Transcription 5 The transcription process synthesizes an RNA copy of DNA. Proteins involved include multi-subunit RNA polymerases, transcription factors HA, JJB, HD, HE, HF, HH, and HJ.
  • DNA-binding structural motifs which comprise either ⁇ -helices or ⁇ - sheets that bind to the major groove of DNA.
  • Four well-characterized structural motifs are helix-turn- helix, zinc finger, leucine zipper, and helix-loop-helix.
  • RNAs are necessary for processing of transcribed RNAs in the nucleus.
  • Pre- mRNA processing steps include capping at the 5' end with methylguanosine, polyadenylating the 3' end, and splicing to remove introns.
  • the spliceosomal complex is comprised of five small nuclear ribonucleoprotein particles (snRNPs) designated UI, U2, U4, U5, and U6.
  • snRNPs contains a 5 single species of snRNA and about ten proteins.
  • the RNA components of some snRNPs recognize and base-pair with intron consensus sequences.
  • the protein components mediate spliceosome assembly and the splicing reaction.
  • snRNP proteins are found in the blood of patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY, p. 863).
  • hnRNPs Heterogeneous nuclear ribonucleoproteins
  • hnRNPs include the yeast proteins Hrplp, involved in cleavage and polyadenylation at the 3' end of the RNA; Cbp80p, involved in capping the 5' end of the RNA; and Npl3p, a homolog of mammalian hnRNP Al, involved in export of 5 mRNA from the nucleus (Shen, E.G. et al. (1998) Genes Dev. 12:679-691). HnRNPs have been shown to be important targets of the autoimmune response in rheumatic diseases (Biamonti, supra). Many snRNP proteins, ImRNP proteins, and alternative splicing factors are characterized by an RNA recognition motif (RRM).
  • RRM RNA recognition motif
  • the RRM is about 80 amino acids in length and forms four ⁇ -strands and two ⁇ -helices o arranged in an ⁇ / ⁇ sandwich.
  • the RRM contains a core RNP-1 octapeptide motif along with surrounding conserved sequences.
  • RNA helicases alter and regulate RNA conformation and secondary structure by using energy derived from ATP hydrolysis to destabilize and unwind RNA duplexes.
  • the most well- 5 characterized and ubiquitous family of RNA helicases is the DEAD-box family, so named for the conserved B-type ATP-binding mot w ic s iagnos c o prote ns n t s ami y. ver - box helicases have been identified in organisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants.
  • DEAD-box helicases function in diverse processes such as translation initiation, splicing, ribosome assembly, and RNA editing, transport, and stability.
  • Some DEAD-box helicases play tissue- and stage-specific roles in spermatogenesis and embryogenesis. (Reviewed in Linder, P. et al. (1989) Nature 337:121-122.)
  • DEAD-box 1 protein may play a role in the progression of neuroblastoma (Nb) and retinoblastoma (Rb) tumors.
  • DEAD-box helicases have been implicated either directly or indirectly in ultraviolet light-induced tumors, B cell lymphoma, and myeloid malignancies. (Reviewed in Godbout, R. et al. (1998) J. Biol. Chem. 273:21161-21168.)
  • RNases Ribonucleases catalyze the hydrolysis of phosphodiester bonds in RNA chains, thus cleaving the RNA.
  • RNase P is a ribonucleoprotein enzyme which cleaves the 5' end of pre-tRNAs as part of their maturation process.
  • RNase H digests the RNA strand of an RNA/DNA hybrid. Such hybrids occur in cells invaded by retroviruses, and RNase H is an important enzyme in the retroviral replication cycle.
  • RNase H domains are often found as a domain associated with reverse transcriptases.
  • RNase activity in serum and cell extracts is elevated in a variety of cancers and infectious diseases (Schein, CH. (1997) Nat. Biotechnol. 15:529-536). Regulation of RNase activity is being investigated as a means to control tumor angiogenesis, allergic reactions, viral infection and replication, and fungal infections. Protein Translation
  • the eukaryotic ribosome is composed of a 60S (large) subunit and a 40S (small) subunit, which together form the 80S ribosome.
  • the ribosome also contains more than fifty proteins.
  • the ribosomal proteins have a prefix which denotes the subunit to which they belong, either L (large) or S (small).
  • L (large) or S (small) Three important sites are identified on the ribosome.
  • the aminoacyl-tRNA site (A site) is where charged tRNAs (with the exception of the initiator-tRNA) bind on arrival at the ribosome.
  • the peptidyl-tRNA site (P site) is where new peptide bonds are formed, as well as where the initiator tRNA binds.
  • the exit site (E site) is where deacylated tRNAs bind prior to their release from the ribosome. (Translation is reviewed in Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY, pp. 875-908; and Lodish, H. et al. (1995) Molecular Cell Biology. Scientific American Books, New York NY, pp. 119-138.) tRNA Charging
  • Protein biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA.
  • the aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of an amino acid with its cognate tRNA.
  • the 20 aminoacyl-tRNA synthetase enzymes can be divided into two structural classes, Class I and Class H. Autoantibodies against aminoacyl-tRNAs are generated by pa ents w th dermatomyos t s an po ymyos s, and correlate strong y w th complicat ng interstitial lung disease (ILD). These antibodies appear to be generated in response to viral infection, and coxsackie virus has been used to induce experimental viral myositis in animals.
  • ILD interstitial lung disease
  • Initiation of translation can be divided into three stages.
  • the first stage brings an initiator transfer RNA (Met-tRNA ⁇ ) together with the 40S ribosomal subunit to form the 43S preinitiation complex.
  • the second stage binds the 43 S preinitiation complex to the mRNA, followed by migration of the complex to the correct AUG initiation codon.
  • the third stage brings the 60S ribosomal subunit to the 40S subunit to generate an 80S ribosome at the initiation codon.
  • Regulation of translation 0 primarily involves the first and second stage in the initiation process (Pain, V.M. (1996) Eur. J. Biochem. 236:747-771).
  • eIF2 a guanine nucleotide binding protein
  • eIF2B a guanine nucleotide exchange protein
  • eIF3 bind and stabilize the 40S subunit by interacting with 18S ribosomal RNA and specific ribosomal structural proteins.
  • eIF3 is also involved in association of the 40S ribosomal subunit with mRNA.
  • Met-tRNA f , elFIA, eIF3, and 40S ribosomal subunit together make up the 43S o preinitiation complex (Pain, supra).
  • eIF4F is a complex consisting of three proteins: eIF4E, eIF4A, and eIF4G.
  • eIF4E recognizes and binds to the mRNA 5 -terminal m 7 GTP cap
  • eIF4A is a bidirectional RNA-dependent helicase
  • eJF4G is a scaffolding polypeptide.
  • eIF4G 5 has three binding domains.
  • eIF4G acts as a bridge between the 40S ribosomal subunit and the mRNA (Hentze, M.W. (1997) Science 275:500-501).
  • the ability of eIF4F to initiate binding of the 43 S preinitiation complex is regulated by o structural features of the mRNA.
  • the mRNA molecule has an untranslated region (UTR) between the 5' cap and the AUG start codon. In some mRNAs this region forms secondary structures that impede binding of the 43 S preinitiation complex.
  • the helicase activity of eIF4A is thought to function in removing this secondary structure to facilitate binding of the 43S preinitiation complex (Pain, supra).
  • Translation Elongation 5 Elongation is the process whereby additional amino acids are joined to the initiator methionine to form the complete polypeptide chain.
  • the elongation factors EFl ⁇ , EFl ⁇ ⁇ , and EF2 are involved in elongating the polypeptide chain following initiation.
  • EFl ⁇ is a GTP-binding protein. In EFl ⁇ 's
  • GTP-bound form it brings an aminoacyl-tRNA to the ribosome' s A site.
  • the amino acid attached to the newly arrived aminoacyl-tRNA forms a peptide bond with the initiator methionine.
  • the GTP on 5 EFl ⁇ is hydrolyzed to GDP, and EFl ⁇ -GDP dissociates from the ribosome.
  • EFl ⁇ ⁇ binds EFl ⁇ -
  • EF-G another GTP-binding protein, catalyzes the translocation of tRNAs from the A site to the P site and finally to the E site of 0 the ribosome. This allows the processivity of translation.
  • the release factor eRF carries out termination of translation. eRF recognizes stop codons in the mRNA, leading to the release of the polypeptide chain from the ribosome.
  • Proteins may be modified after translation by the addition of phosphate, sugar, prenyl, fatty acid, and other chemical groups. These modifications are often required for proper protein activity.
  • Enzymes involved in post-translational modification include kinases, phosphatases, glycosyltransferases, and prenyltransferases.
  • the conformation of proteins may also be modified after translation by the introduction and rearrangement of disulfide bonds (rearrangement catalyzed by o protein disulfide isomerase), the isomerization of proline sidechains by prolyl isomerase, and by interactions with molecular chaperone proteins.
  • Proteins may also be cleaved by proteases. Such cleavage may result in activation, inactivation, or complete degradation of the protein.
  • proteases include serine proteases, cysteine proteases, aspartic proteases, and metalloproteases.
  • Signal peptidase in the endoplasmic reticulum 5 (ER) lumen cleaves the signal peptide from membrane or secretory proteins that are imported into the ER.
  • Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of cellular proteins in eukaryotic cells and some bacteria.
  • UCS ubiquitin conjugation system
  • the UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression.
  • o proteins targeted for degradation are conjugated to a ubiquitin, a small heat stable protein.
  • Proteins involved in the UCS include ubiquitin-activating enzyme, ubiquitin-conjugating enzymes, ubiquitin- ligases, and ubiquitin C-terminal hydrolases.
  • the ubiquitinated protein is then recognized and degraded by the proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease.
  • Lipid Metabolism Lipids are water-insoluble, oily or greasy substances that are soluble in nonpolar solvents such as chloroform or ether. Neutral fats (triacylglycerols) serve as major fuels and energy stores. Polar lipids, such as phosphoHpids, sphingoHpids, glycoHpids, and cholesterol, are key structural components of cell membranes. Lipid metaboHsm is involved in human diseases and disorders. In the arterial disease atherosclerosis, fatty lesions form on the inside of the arterial wall. These lesions promote the loss of arterial flexibility and the formation of blood clots (Guyton, A.C Textbook of Medical Physiology (1991) W.B. Sau ⁇ ders Company, Philadelphia PA, pp.760-763).
  • the GM 2 ganghoside (a sphingoHpid) accumulates in lysosomes of the central nervous system due to a lack of the enzyme N-acetylhexosaminidase.
  • Patients suffer nervous system degeneration leading to early death (Fauci, A.S. et al. (1998) Harrison's Principles of Internal Medicine McGraw-Hill, New York NY, p. 2171).
  • the Niemann-Pick diseases are caused by defects in Hpid metaboHsm.
  • Niemann-Pick diseases types A and B are caused by accumulation of sphingomyelin (a sphingoHpid) and other Hpids in the central nervous system due to a defect in the enzyme sphingomyeHnase, leading to neurodegeneration and lung disease.
  • Niemann-Pick disease type C results from a defect in cholesterol transport, leading to the accumulation of sphingomyelin and cholesterol in lysosomes and a secondary reduction in sphingomyeHnase activity.
  • Neurological symptoms such as grand mal seizures, ataxia, and loss of previously learned speech, manifest 1-2 years after birth.
  • NPC protein which contains a putative cholesterol-sensing domain
  • Niemann-Pick disease type C a mouse model of Niemann-Pick disease type C (Fauci, supra, p. 2175; Loftus, S.K. et al. (1997) Science 277:232-235).
  • Lipid metaboHsm is reviewed in Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY; Lehninger, A.
  • Fatty acids are long-chain organic acids with a single carboxyl group and a long non-polar hydrocarbon tail.
  • Long-chain fatty acids are essential components of glycoHpids, phosphoHpids, and cholesterol, which are building blocks for biological membranes, and of triglycerides, which are biological fuel molecules.
  • Long-chain fatty acids are also substrates for eicosanoid production, and are important in the functional modification of certain complex carbohydrates and proteins. 16-carbon and 18-carbon fatty acids are the most common.
  • Fatty acid synthesis occurs in the cytoplasm.
  • acetyl-Coenzyme A (CoA) carboxylase (ACC) synthesizes malonyl-CoA from acetyl-CoA and bicarbonate.
  • the enzymes which catalyze the remaining reactions are covalently linked into a single polypeptide chain, referred to as the multifunctional enzyme fatty acid synthase (FAS).
  • FAS catalyzes the synthesis of palmitate from acetyl-CoA and malonyl-CoA.
  • FAS contains acetyl transferase, malonyl transferase, ⁇ -ketoacetyl synthase, acyl carrier protein, ⁇ -ketoacyl reductase, dehydratase, enoyl reductase, and thioesterase activities.
  • the final product of the FAS reaction is the 16-carbon fatty acid palmitate. Further elongation, as well as unsaturation, of palmitate by accessory enzymes of the ER produces the variety of long chain fatty acids required by the individual cell. These enzymes include a NADH-cytochrome 5 b 5 reductase, cytochrome b 5 , and a desaturase.
  • Triacylglycerols also known as triglycerides and neutral fats, are major energy stores in animals. Triacylglycerols are esters of glycerol with three fatty acid chains. Glycerol-3 -phosphate is produced from dihydroxyacetone phosphate by the enzyme glycerol phosphate dehydrogenase or from 0 glycerol by glycerol kinase. Fatty acid-CoA's are produced from fatty acids by fatty acyl-CoA synthetases. Glyercol-3 -phosphate is acylated with two fatty acyl-CoA's by the enzyme glycerol phosphate acyltransferase to give phosphatidate.
  • Phosphatidate phosphatase converts phosphatidate to diacylglycerol, which is subsequently acylated to a triacylglyercol by the enzyme diglyceride acyltransferase. Phosphatidate phosphatase and diglyceride acyltransferase form a triacylglyerol 5 synthetase complex bound to the ER membrane.
  • a major class of phosphoHpids are the phosphoglycerides, which are composed of a glycerol backbone, two fatty acid chains, and a phosphorylated alcohol.
  • Phosphoglycerides are components of ceU membranes. Principal phosphoglycerides are phosphatidyl choHne, phosphatidyl ethanolamine, phosphatidyl serine, phosphatidyl inositol, and diphosphatidyl glycerol.
  • Many enzymes involved in 0 phosphoglyceride synthesis are associated with membranes (Meyers, R.A. (1995) Molecular Biology and Biotechnology, VCH PubHshers Inc., New York NY, pp. 494-501).
  • Phosphatidate is converted to CDP-diacylglycerolby the enzyme phosphatidate cytidylyltransferase (ExPASy ENZYME EC 2JJ.41). Transfer of the diacylglycerol group from CDP-diacylglycerol to serine to yield phosphatidyl serine, or to inositol to yield phosphatidyl inositol, is catalyzed by the enzymes CDP- 5 diacylglycerol-serine O-phosphatidyltransferase and CDP-diacylglycerol-inositol 3- phosphatidyltransferase, respectively (ExPASy ENZYME EC 2J.8.8; ExPASy ENZYME EC 2J.8.11).
  • the enzyme phosphatidyl serine decarboxylase catalyzes the conversion of phosphatidyl serine to phosphatidyl ethanolamine, using a pyruvate cofactor (Voelker, D.R. (1997) Biochim. Biophys. Acta 1348:236-244).
  • Phosphatidyl choHne is formed using diet-derived choline by the o reaction of CDP-choHne with 1 ,2-diacylglycerol, catalyzed by diacylglycerol cholinephosphotransferase (ExPASy ENZYME 2J.8.2).
  • Cholesterol composed of four fused hydrocarbon rings with an alcohol at one end, moderates the fluidity of membranes in which it is incorporated.
  • cholesterol is used in the synthesis of 5 steroid hormones such as cortisol, progesterone, estrogen, and testosterone.
  • Bile salts derived from cholesterol facintate the digestion of Hpids.
  • Cholesterol m the skin forms a barrier that prevents excess water evaporation from the body.
  • Farnesyl and geranylgeranyl groups which are derived from cholesterol biosynthesis intermediates, are post-translationally added to signal transduction proteins such as ras and protein-targeting proteins such as rab. These modifications are important for the 5 activities of these proteins (Guyton, supra; Stryer, supra, pp. 279-280, 691-702, 934).
  • HMG-CoA hydroxymethylglutaryl-CoA
  • the rate-limiting step is the conversion of HMG-CoA to mevalonate by HMG- CoA reductase.
  • the drug lovastatin, a potent inhibitor of HMG-CoA reductase, is given to patients to reduce their serum cholesterol levels.
  • mevalonate pathway enzymes include mevalonate kinase, phosphomevalonate kinase, diphosphomevalonate decarboxylase, isopentenyldiphosphate isomerase, 5 dimethylallyl transferase, geranyl transferase, farnesyl-diphosphate farnesyltransferase, squalene monooxygenase, lanosterol synthase, lathosterol oxidase, and 7-dehydrocholesterol reductase.
  • Cholesterol is used in the synthesis of steroid hormones such as cortisol, progesterone, aldosterone, estrogen, and testosterone.
  • cholesterol is converted to pregnenolone by cholesterol monooxygenases.
  • the other steroid hormones are synthesized from pregnenolone by a series of o enzyme-catalyzed reactions including oxidations, isomerizations, hydroxylations, reductions, and demethylations. Examples of these enzymes include steroid ⁇ -isomerase, 3 ⁇ -hydroxy- ⁇ 5 -steroid dehydrogenase, steroid 21 -monooxygenase, steroid 19-hydroxylase, and 3 ⁇ -hydroxysteroid dehydrogenase. Cholesterol is also the precursor to vitamin D.
  • Isoprenoid groups are found in vitamin K, ubiquinone, retinal, doHchol phosphate (a carrier of oHgosaccharides needed for N-Hnked glycosylation), and farnesyl and geranylgeranyl groups that modify proteins. Enzymes involved include farnesyl transferase, polyprenyl transferases, doHchyl phosphatase, and doHchyl kinase.
  • SphingoHpid MetaboHsm o SphingoHpids are an important class of membrane Hpids that contain sphingosine, a long chain amino alcohol.
  • SphingoHpids are composed of one long-chain fatty acid, one polar head alcohol, and sphingosine or sphingosine derivative.
  • the three classes of SphingoHpids are sphingomyelins, cerebrosides, and gangHosides.
  • Sphingomyelins which contain phosphochoHne or phosphoethanolamine as their head group, are abundant in the myelin sheath surrounding nerve ceHs.
  • Galactocerebrosides which contain a glucose or galactose head group, are characteristic of the brain.
  • Other cerebrosides are found in nonneural tissues.
  • GangHosides whose head groups contain multiple sugar units, are abundant in the brain, but are also found in nonneural tissues.
  • SphingoHpids are built on a sphingosine backbone. Sphingosine is acylated to ceramide by the enzyme sphingosine acetyltransferase. Ceramide and phosphatidyl choHne are converted to sphingomyelin by the enzyme ceramide choHne phosphotiansferase. Cerebrosides are synthesized by the linkage of glucose or galactose to ceramide by a transferase. Sequential addition of sugar residues to ceramide by transferase enzymes yields gangHosides. Eicosanoid MetaboHsm
  • Eicosanoids including prostaglandins, prostacyclin, thromboxanes, and leukotrienes, are 20- carbon molecules derived from fatty acids. Eicosanoids are signaling molecules which have roles in pain, fever, and inflammation. The precursor of all eicosanoids is arachidonate, which is generated from phosphoHpids by phosphoHpase A 2 and from diacylglycerols by diacylglycerol Hpase. Leukotrienes are produced from arachidonate by the action of Hpoxygenases. Prostaglandin synthase, reductases, and isomerases are responsible for the synthesis of the prostaglandins.
  • Prostaglandins have roles in inflammation, blood flow, ion transport, synaptic transmission, and sleep.
  • ProstacycHn and the thromboxanes are derived from a precursor prostaglandin by the action of prostacyclin synthase and thromboxane synthases, respectively.
  • acetyl-CoA molecules derived from fatty acid oxidation in the Hver can condense to form acetoacetyl-CoA, which subsequently forms acetoacetate, D-3-hydroxybutyrate, and acetone.
  • These three products are known as ketone bodies.
  • Enzymes involved in ketone body metaboHsm include HMG-CoA synthetase, HMG-CoA cleavage enzyme, D-3-hydroxybutyrate dehydrogenase, acetoacetate decarboxylase, and 3-ketoacyl-CoA transferase.
  • Ketone bodies are a normal fuel supply , of the heart and renal cortex.
  • Acetoacetate produced by the Hver is transported to cens where the acetoacetate is converted back to acetyl-CoA and enters the citric acid cycle.
  • ketone bodies produced from stored triacylglyerols become an important fuel source, especially for the brain. Abnormally high levels of ketone bodies are observed in diabetics. Diabetic coma can result if ketone body levels become too great.
  • Lipid Mobilization Within cells fatty acids are transported by cytoplasmic fatty acid binding proteins (Online
  • Diazepam binding inhibitor also known as endozepine and acyl CoA-binding protein, is an endogenous ⁇ -aminobutyric acid (GABA) receptor Hgand which is thought to down-regulate the effects of GABA.
  • GABA ⁇ -aminobutyric acid
  • DBI binds medium- and long-chain acyl-CoA esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters (OMDVI * 125950 Diazepam Binding Inhibitor; DBI; PROSITE PDOC00686 Acyl-CoA-binding protein signature).
  • Fat stored in Hver and adipose triglycerides may be released by hydrolysis and transported in the blood. Free fatty acids are transported in the blood by albumin. Triacylglycerols and cholesterol esters in the blood are transported in Hpoprotein particles. The particles consist of a core of 5 hydrophobic Hpids surrounded by a shell of polar Hpids and apoHpoproteins. The protein components serve in the solubiHzation of hydrophobic Hpids and also contain cell-targeting signals.
  • Lipoproteins include chylomicrons, chylomicron remnants, very-low-density Hpoproteins (VLDL), intermediate- density Hpoproteins (DDL), low-density Hpoproteins (LDL), and high-density Hpoproteins (HDL).
  • VLDL very-low-density Hpoproteins
  • DDL intermediate- density Hpoproteins
  • LDL low-density Hpoproteins
  • HDL high-density Hpoproteins
  • Triacylglycerols in chylomicrons and VLDL are hydrolyzed by Hpoprotein Hpases that Hne blood vessels in muscle and other tissues that use fatty acids.
  • Cell surface LDL receptors bind LDL particles which are then internaHzed by endocytosis. Absence of the LDL receptor, the cause of the disease famiHalhypercholesterolemia, leads to increased plasma cholesterol levels and ultimately to
  • Plasma cholesteryl ester transfer protein mediates the transfer of cholesteryl esters from HDL to apoHpoprotein B-containing Hpoproteins. Cholesteryl ester transfer protein is important in the reverse cholesterol transport system and may play a role in atherosclerosis (Yamashita, S. et al. (1997) Curr. Opin. Lipidol. 8:101-110). Macrophage scavenger receptors, which bind and internaHze modified Hpoproteins, play a role in Hpid transport and may contribute to atherosclerosis (Greaves,
  • SREBP sterol regulatory element binding protein
  • OSBP oxysterol-binding protein
  • Mitochondrial and peroxisomal beta-oxidation enzymes degrade saturated and unsaturated fatty acids by sequential removal of two-carbon units from CoA-activated fatty acids.
  • 35 oxidation pathway degrades both saturated and unsaturated fatty acids while the auxiliary pathway performs additional steps required for the degradation of unsaturated fatty acids.
  • Mitochondria oxidize short-, medium-, and long-chain fatty acids to produce energy for cells.
  • Mitochondrial beta-oxidation is a major energy source for cardiac and skeletal muscle. In Hver, it provides ketone bodies to the peripheral circulation when glucose levels are low as in starvation, endurance exercise, and diabetes (Eaton, S. et al. (1996) Biochem. J. 320:345-357).
  • Peroxisomes oxidize medium-, long-, and very-long-chain fatty acids, dicarboxyHc fatty acids, branched fatty acids, prostaglandins, xenobiotics, and bile acid intermediates.
  • the chief roles of peroxisomal beta-oxidation are to shorten toxic HpophiHc carboxyHc acids to faciHtate their excretion and to shorten very-long-chain fatty acids prior to mitochondrial beta-oxidation (Mannaerts, G.P. and P.P. van Veldhoven (1993) Biochimie 75:147-158).
  • Enzymes involved in beta-oxidation include acyl CoA synthetase, carnitine acyltransferase, acyl CoA dehydrogenases, enoyl CoA hydratases, L-3-hydroxyacyl CoA dehydrogenase, ⁇ - ketothiolase, 2,4-dienoyl CoA reductase, and isomerase.
  • LPLs LysophosphoHpases
  • a particular substrate for LPLs lysophosphatidylcholine, causes lysis of ceH membranes when it is formed or imported into a cell.
  • LPLs are regulated by Hpid factors including acylcarnitine, arachidonic acid, and phosphatidic acid.
  • the secretory phosphoHpase A 2 (PLA2) superfamily comprises a number of heterogeneous enzymes whose common feature is to hydrolyze the sn-2 fatty acid acyl ester bond of phosphoglycerides. Hydrolysis of the glycerophosphoHpids releases free fatty acids and lysophosphoHpids.
  • PLA2 activity generates precursors for the biosynthesis of biologically active Hpids, hydroxy fatty acids, and platelet-activating factor.
  • PLA2 hydrolysis of the sn-2 ester bond in phosphoHpids generates free fatty acids, such as arachidonic acid and lysophosphoHpids.
  • Carbohydrates including sugars or saccharides, starch, and cellulose, are aldehyde or ketone compounds with multiple hydroxyl groups. The importance of carbohydrate metaboHsm is demonstrated by the sensitive regulatory system in place for maintenance of blood glucose levels. Two pancreatic hormones, insulin and glucagon, promote increased glucose uptake and storage by cells, and increased glucose release from cells, respectively. Carbohydrates have three important roles in mammaHan ceUs. First, carbohydrates are used as energy stores, fuels, and metaboHc intermediates. Carbohydrates are broken down to form energy in glycolysis and are stored as glycogen for later use.
  • sugars deoxyribose and ribose form part of the structural support of DNA and RNA, respectively.
  • carbohydrate modifications are added to secreted and membrane proteins and Hpids as they traverse the secretory pathway.
  • Cell surface carbohydrate- containing macromolecules including glycoproteins, glycoHpids, and transmembrane proteoglycans, mediate adhesion with other cells and with components of the extracellular matrix.
  • the extracellular matrix is comprised of diverse glycoproteins, glycosaminoglycans (GAGs), and carbohydrate-binding proteins which are secreted from the cell and assembled into an organized meshwork in close association with the ceH surface.
  • Carbohydrate metaboHsm is altered in several disorders including diabetes melHtus, hyperglycemia, hypoglycemia, galactosemia, galactokinase deficiency, and UDP-galactose-4- epimerase deficiency (Fauci, A.S. et al. (1998) Harrison's Principles of Internal Medicine, McGraw- Hill, New York NY, pp. 2208-2209). Altered carbohydrate metaboHsm is associated with cancer.
  • the pathway also provides building blocks for the synthesis of ceUular components such as long-chain fatty acids.
  • pyrvuate is converted to acetyl-Coenzyme A, which, in aerobic organisms, enters the citric acid cycle.
  • Glycolytic enzymes include hexokinase, phosphoglucose isomerase, phosphofructokinase, aldolase, triose phosphate isomerase, glyceraldehyde 5 3 -phosphate dehydrogenase, phosphoglycerate kinase, phosphoglyceromutase, enolase, and pyruvate kinase.
  • phosphofructokinase, hexokinase, and pyruvate kinase are important in regulating the rate of glycolysis.
  • Gluconeogenesis is the synthesis of glucose from noncarbohydrate precursors such as lactate and amino acids.
  • the pathway wliich functions mainly in times of starvation and intense exercise, occurs mostly in the Hver and kidney.
  • responsible enzymes include pyruvate carboxylase, phosphoenolpyruvate carboxykinase, fructose 1,6-bisphosphatase, and glucose-6-phosphatase.
  • Pentose phosphate pathway enzymes are responsible for generating the reducing agent NADPH, while at the same time oxidizing glucose-6-phosphate to ribose-5-phosphate. Ribose-5- phosphate and its derivatives become part of important biological molecules such as ATP, Coenzyme A, NAD + , FAD, RNA, and DNA.
  • the pentose phosphate pathway has both oxidative and non- oxidative branches. The oxidative branch steps, which are catalyzed by the enzymes glucose-6- phosphate dehydrogenase, lactonase, and 6-phosphogluconate dehydrogenase, convert glucose-6- phosphate and NADP ⁇ to ribulose-6-phosphate and NADPH.
  • non-oxidative branch steps which are catalyzed by the enzymes phosphopentose isomerase, phosphopentose epimerase, transketolase, and transaldolase, allow the interconversion of three-, four-, five-, six-, and seven-carbon sugars.
  • Glucouronate MetaboHsm isomerase, phosphopentose epimerase, transketolase, and transaldolase
  • Glucuronate is a monosacchari.de which, in the form of D-glucuronic acid, is found in the GAGs chondroitin and dermatan. D-glucuronic acid is also important in the detoxification and excretion of foreign organic compounds such as phenol. Enzymes involved in glucuronate metaboHsm include UDP-glucose dehydrogenase and glucuronate reductase. Disaccharide MetaboHsm
  • Disaccharides must be hydrolyzed to monosaccharides to be digested. Lactose, a disaccharide found in milk, is hydrolyzed to galactose and glucose by the enzyme lactase. Maltose is derived from plant starch and is hydrolyzed to glucose by the enzyme maltase. Sucrose is derived from plants and is hydrolyzed to glucose and fructose by the enzyme sucrase. Trehalose, a disaccharide found mainly in insects and mushrooms, is hydrolyzed to glucose by the enzyme trehalase (OMIM *275360 Trehalase; Ruf, J. et al. (1990) J. Biol. Chem. 265:15034-15039).
  • Lactase, maltase, sucrase, and trehalase are bound to mucosal cells lining the smaU intestine, where they participate in the digestion of dietary disaccharides.
  • lactose synthetase composed of the catalytic subunit galactosyltransferase and the modifier subunit ⁇ -lactalbumin, converts UDP-galactose and glucose to lactose in the mammary glands.
  • Glycogen, Starch, and Chitin MetaboHsm Glycogen is the storage form of carbohydrates in mammals. Mobilization of glycogen maintains glucose levels between meals and during muscular activity. Glycogen is stored mainly m the
  • Enzymes that catalyze the degradation of glycogen include glycogen phosphorylase, a tiansferase, ⁇ - 1,6-glucosidase, and phosphoglucomutase. Enzymes that catalyze the synthesis of glycogen include UDP-glucose pyrophosphorylase, glycogen synthetase, a branching enzyme, and nucleoside diphosphokinase.
  • the enzymes of glycogen synthesis and degradation are tightly regulated by the hormones insulin, glucagon, and epinephrine.
  • Starch a plant-derived polysaccharide, is hydrolyzed to maltose, maltotriose, and ⁇ -dextrinby ⁇ -amylase, an enzyme secreted by the saHvary glands and pancreas.
  • Chitin is a polysaccharide found in insects and Crustacea.
  • a chitotriosidase is secreted by macrophages and may play a role in the degradation of cHtin-containing pathogens (Boot, R.G. et al. (1995) J. Biol. Chem. 270:26252-26256).
  • GAGs are anionic linear unbranched polysaccharides composed of repetitive disaccharide units. These repetitive units contain a derivative of an amino sugar, either glucosamine or galactosamine. GAGs exist free or as part of proteoglycans, large molecules composed of a core protein attached to one or more GAGs. GAGs are found on the ceH surface, inside ceHs, and in the extracellular matrix. Changes in GAG levels are associated with several autoimmune diseases including autoimmune thyroid disease, autoimmune diabetes melHtus, and systemic lupus erythematosus (Hansen, C. et al. (1996) CHn. Exp. Rheum. 14 (Suppl.
  • GAGs include chondroitin sulfate, keratan sulfate, heparin, heparan sulfate, dermatan sulfate, and hyaluronan.
  • HA GAG hyaluronan
  • GAG hyaluronan The GAG hyaluronan (HA) is found in the extracenular matrix of many ceHs, especially in soft connective tissues, and is abundant in synovial fluid (PitsilHdes, A.A. et al. (1993) Int. J. Exp. Pathol. 74:27-34). HA seems to play important roles in cell regulation, development, and differentiation (Laurent, T.C and J.R. Fraser (1992) FASEB J. 6:2397-2404).
  • Hyaluronidase is an enzyme that degrades HA to oHgosaccharides. Hyaluronidases may function in cell adhesion, infection, angiogenesis, signal transduction, reproduction, cancer, and inflammation.
  • Proteoglycans also known as peptidoglycans, are found in the extracellular matrix of connective tissues such as cartilage and are essential for distributing the load in weight-bearing joints.
  • Cell-surface-attached proteoglycans anchor ceHs to the extracellular matrix. Both extracellular and cell-surface proteoglycans bind growth factors, facilitating their binding to cell-surface receptors and subsequent triggering of signal transduction pathways.
  • Amino Acid and Nitrogen MetaboHsm NH is assimilated into amino acids by the actions of two enzymes, glutamate dehydrogenase and glutamine synthetase.
  • the carbon skeletons of amino acids come from the intermediates of glycolysis, the pentose phosphate pathway, or the citric acid cycle. Of the twenty amino acids used in proteins, humans can synthesize only thirteen (nonessential amino acids). The remaining nine must come from the diet (essential amino acids).
  • Enzymes involved in nonessential amino acid biosynthesis include glutamate kinase dehydrogenase, pyrroline carboxylate reductase, asparagine synthetase, phenylalanine oxygenase, methionine adenosyltransferase, adenosylhomocysteinase, cystathionine ⁇ - synthase, cystathionine ⁇ -lyase, phosphoglycerate dehydrogenase, phosphoserine transaminase, phosphoserine phosphatase, serine hydroxyknethyltransferase, and glycine synthase.
  • MetaboHsm of amino acids takes place almost entirely in the Hver, where the amino group is removed by aminotransferases (transaminases), for example, alanine aminotransferase.
  • the amino group is transferred to ⁇ -ketoglutarate to form glutamate.
  • Glutamate dehydrogenase converts glutamate to NH 4 "1" and ⁇ -ketoglutarate.
  • NrJ is converted to urea by the urea cycle which is catalyzed by the enzymes arginase, ornithine transcarbamoylase, arginosuccinate synthetase, and arginosuccinase.
  • Carbamoyl phosphate synthetase is also involved in urea formation.
  • Enzymes involved in the metaboHsm of the carbon skeleton of amino acids include serine dehydratase, asparaginase, glutaminase, propionyl CoA carboxylase, methylmalonyl CoA mutase, branched-chain ⁇ -keto dehydrogenase complex, isovaleryl CoA dehydrogenase, ⁇ -methylcrotonyl CoA carboxylase, phenylalanine hydroxylase, p-hydroxylphenylpyruvate hydroxylase, and homogentisate oxidase.
  • Polyamines which include spermidine, putrescine, and spermine, bind tightly to nucleic acids and are abundant in rapidly proHferating ceHs. Enzymes involved in polyamine synthesis include on ⁇ hine decarboxylase.
  • CeHs derive energy from metaboHsm of ingested compounds that maybe roughly categorized as carbohydrates, fats, or proteins. Energy is also stored in polymers such as triglycerides (fats) and glycogen (carbohydrates). MetaboHsm proceeds along separate reaction pathways connected by key intermediates such as acetyl coenzyme A (acetyl-CoA). MetaboHc pathways feature anaerobic and aerobic degradation, coupled with the energy-requiring reactions such as phosphorylation of adenosine diphosphate (ADP) to the triphosphate (ATP) or analogous phosphorylations of guanosine (GDP/GTP), uridine (UDP/UTP), or cytidine (CDP/CTP). Subsequent dephosphorylation of the triphosphate drives reactions needed for ceH maintenance, growth, and proHferation.
  • ADP adenosine diphosphate
  • ATP triphosphate
  • UDP/UTP uridine
  • Digestive enzymes convert carbohydrates and sugars to glucose; fructose and galactose are converted in the Hver to glucose. Enzymes involved in these conversions include galactose-1 - phosphate uridyl transferase and UDP-galactose-4 epimerase. In the cytoplasm, glycolysis converts glucose to pyruvate in a series of reactions coupled to ATP synthesis.
  • Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydroHpoyl transacetylase, and dihydroHpoyl dehydrogenase.
  • Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccinylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase.
  • Acetyl CoA is oxidized to C0 2 with concomitant formation of NADH, FADH ⁇ , and GTP.
  • the transport of electrons from NADH and FADH 2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P j by the F O F-L ATPase complex in the mitochondrial inner membrane.
  • Enzyme complexes responsible for electron transport and ATP synthesis include the FoF, ⁇ ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c 1? FeS protein, and cytochrome c oxidase.
  • Triglycerides are hydrolyzed to fatty acids and glycerol by Hpases. Glycerol is then phosphorylated to glycerol-3 -phosphate by glycerol kinase and glycerol phosphate dehydrogenase, and degraded by the glycolysis. Fatty acids are transported into the mitochondria as fatty acyl-carnitine esters and undergo oxidative degradation.
  • Cofactor MetaboHsm Cofactors, including coenzymes and prosthetic groups, are smaH molecular weight inorganic or organic compounds that are required for the action of an enzyme. Many cofactors contain vitamins as a component.
  • Cofactors include thiamine pyrophosphate, flavin adenine dinucleotide, flavin mononucleotide, nicotinamide adenine dinucleotide, pyridoxal phosphate, coenzyme A, tetrahydrofolate, Hpoamide, and heme.
  • the vitamins biotin and cobalamin are associated with enzymes as weH.
  • Heme a prosthetic group found in myoglobin and hemoglobin, consists of protoporphyrin group bound to iron. Porphyrin groups contain four substituted pyrroles covalently joined in a ring, often with a bound metal atom.
  • Enzymes involved in porphyrin synthesis include ⁇ - aminolevuHnate synthase, ⁇ -aminolevuHnate dehydrase, porphobilinogen deaminase, and cosynthase. Deficiencies in heme formation cause porphyrias. Heme is broken down as a part of erythrocyte 5 turnover. Enzymes involved in heme degradation include heme oxygenase and biHverdin reductase. Iron is a required cofactor for many enzymes. Besides the heme-containing enzymes, iron is found in iron-sulfur clusters in proteins including aconitase, succinate dehydrogenase, and NADH-Q reductase. Iron is transported in the blood by the protein transferrin. Binding of transferrin to the transferrin receptor on cell surfaces aUows uptake by receptor mediated endocytosis. CytosoHc iron is 0 bound to ferritin protein
  • a molybdenum-containing cofactor (molybdopterin) is found in enzymes including sulfite oxidase, xanthine dehydrogenase, and aldehyde oxidase. Molybdopterin biosynthesis is performed by two molybdenum cofactor synthesizing enzymes. Deficiencies in these enzymes cause mental retardation and lens dislocation. Other diseases caused by defects in cofactor metaboHsm include 5 pernicious anemia and methylmalonic aciduria. Secretion and Trafficking
  • Eukaryotic cells are bound by a Hpid bilayer membrane and subdivided into functionally distinct, membrane bound compartments.
  • the membranes maintain the essential differences between the cytosol, the extracenular environment, and the lumenal space of each intraceUular organelle.
  • As o Hpid membranes are highly impermeable to most polar molecules, transport of essential nutrients, metaboHc waste products, cell signaling molecules, macromolecules and proteins across Hpid membranes and between organeUes must be mediated by a variety of transport-associated molecules. Protein Trafficking
  • the final Golgi compartment is the Trans-Golgi Network (TGN), where both membrane and lumenal proteins are sorted for their final destination.
  • TGN Trans-Golgi Network
  • a secretory 0 vesicle which contains proteins destined for the plasma membrane, such as receptors, adhesion molecules, and ion channels, and secretory proteins, such as hormones, neurotransmitters, and digestive enzymes.
  • Secretory vesicles eventuaHy fuse with the plasma membrane (GHck, B.S. and V. Malhotra (1998) CeH 95:883-889).
  • the secretory process can be constitutive or regulated.
  • ceHs have a constitutive 5 pathway for secretion, whereby vesicles derived from maturation of the TGN require no specific signal to fuse with the plasma membrane.
  • ceHs such as endocrine ceHs, digestive ceHs, and neurons
  • Endocytosis o Endocytosis, wherein ceHs internaHze material from the extraceHular environment, is essential for transmission of neuronal, metaboHc, and proHferative signals; uptake of many essential nutrients; and defense against invading organisms.
  • phagocytosis is an actin-driven process exempHfied in macrophage and neutrophils.
  • Material to be endocytosed contacts numerous ceH surface receptors which stimulate the plasma membrane to 5 extend and surround the particle, enclosing it in a membrane-bound phagosome.
  • IgG-coated particles bind Fc receptors on the surface of phagocytic leukocytes. Activation of the Fc receptors initiates a signal cascade involving src-family cytosoHc kinases and the monomeric GTP-binding (G) protein Rho.
  • the resulting actin reorganization leads to phagocytosis of the particle.
  • This process is an important component of the humoral immune response, aUowing the o processing and presentation of bacterial-derived peptides to antigen-specific T-lymphocytes.
  • the second form of endocytosis is a more generaHzed uptake of material from the external miHeu.
  • pinocytosis is activated by Hgand binding to ceH surface receptors. Activation of individual receptors stimulates an internal response that includes coalescence of the receptor-Hgand complexes and formation of clathrin-coated pits. Invagination of the plasma 5 membrane at clathrin-coated pits produces an endocytic vesicle within the ceH cytoplasm. These vesicles undergo homotypic fusion to form an early endosomal (EE) compartment.
  • the tubulovesicular EE serves as a sorting site for incoming material.
  • ATP-driven proton pumps in the EE membrane lowers the pH of the EE lumen (pH 6.3-6.8).
  • the acidic environment causes many Hgands to dissociate from their receptors.
  • the receptors, along with membrane and other integral membrane proteins, are recycled back to the plasma membrane by budding off the tubular extensions of the EE in recycling vesicles (RV).
  • RV recycling vesicles
  • This selective removal of recycled components produces a carrier vesicle containing Hgand and other material from the external environment.
  • the carrier vesicle fuses with TGN-derived vesicles which contain hydrolytic enzymes.
  • the acidic environment of the resulting late endosome (LE) activates the hydrolytic enzymes which degrade the Hgands and other material. As digestion takes place, the LE fuses with the lysosome where digestion is completed (MeUman, I. (1996) Annu. Rev. CeU Dev. Biol. 12:575-625).
  • Receptors internaHzed and returned directly to the plasma membrane have a turnover rate of 2-3 minutes.
  • Some RVs undergo microtubule-directed relocation to a perinuclear site, from which they then return to the plasma membrane.
  • Receptors foUowmg tins route have a turnover rate of 5-10 minutes.
  • StiU other RVs are retained within the ceH until an appropriate signal is received (MeUman, supra; and James, D.E. et al. (1994) Trends CeH Biol. 4:120-126).
  • vesicles form at the transitional endoplasmic reticulum (tER), the rim of Golgi cisternae, the face of the Trans-Golgi Network (TGN), the plasma membrane (PM), and tubular extensions of the endosomes.
  • tER transitional endoplasmic reticulum
  • TGN Trans-Golgi Network
  • PM plasma membrane
  • tubular extensions of the endosomes The process begins with the budding of a vesicle out of the donor membrane.
  • the membrane-bound vesicle contains proteins to be transported and is surrounded by a protective coat made up of protein subunits recruited from the cytosol.
  • the initial budding and coating processes are controUed by a cytosoHc ras-like GTP-binding protein, ADP- ribosylating factor (Arf), and adapter proteins (AP).
  • a cytosoHc ras-like GTP-binding protein ADP- ribosylating factor (Arf)
  • AP adapter proteins
  • Different isoforms of both Arf and AP are involved at different sites of budding.
  • Another smaU G-protein, dynamin forms a ring complex around the neck of the forming vesicle and may provide the mechanochemical force to accompHsh the final step of the budding process.
  • the coated vesicle complex is then transported through the cytosol.
  • the COP coat consists of two major components, a -protein (Arf or Sar) and coat protomer (coatomer). Coatomer s an equimolar complex of seven proteins, termed alpha-, beta-, beta'-, gamma-, delta-, epsilon- and zeta-COP. (Harter, C and F.T. Wieland (1998) Proc. Natl. Acad. Sci. USA 95:11649-11654.) Membrane Fusion 5 Transport vesicles undergo homotypic or heterotypic fusion in the secretory and endocytotic pathways.
  • Molecules required for appropriate targeting and fusion of vesicles with their target membrane include proteins incorporated in the vesicle membrane, the target membrane, and proteins recruited from the cytosol.
  • VAMP vesicle-associated membrane protein
  • a cytosoHc prenylated GTP-binding protein, Rab a member of the Ras superfamily
  • Rab a member of the Ras superfamily
  • GTPase activating proteins in the target membrane convert Rab proteins to the GDP-bound form.
  • GDI guanine-nucleotide dissociation inhibitor
  • Rab proteins appear to play a role in mediating the function of a viral gene, Rev, which is essential for repHcation of HIN- 1, the virus responsible for AIDS (FlaveH, RA. et al. (1996) Proc. ⁇ atl. Acad. Sci. USA 93:4421-4424).
  • N-ethylmaleimide sensitive factor (NSF) and soluble NSF-attachment protein ( ⁇ -SNAP and ⁇ -SNAP) 5 are two such proteins that are conserved from yeast to man and function in most intraceUular membrane fusion reactions.
  • Seel represents a family of yeast proteins that function at many different stages in the secretory pathway including membrane fusion. Recently, mammaHan homologs of Seel, caUed Munc-18 proteins, have been identified (Katagiri, H. et al. (1995) J. Biol. Chem. 270:4963-4966; Hata et al. supra).
  • the SNARE complex involves three SNARE molecules, one in the vesicular membrane and two in the target membrane.
  • Synaptotagmin is an integral membrane protein in the synaptic vesicle which associates with the t-SNARE syntaxin in the docking complex. Synaptotagmin binds calcium in a complex with negatively charged phosphoHpids, which aUows the cytosoHc SNAP protein to displace synaptotagmin from syntaxin and fusion to occur. Thus, synaptotagmin is a negative regulator of 5 fusion in the neuron (Littleton, J.T. et al. (1993) CeU 74:1125-1134). The most abundant membrane protein of synaptic vesicles appears to be the glycoprotein synaptophysin, a 38 kDa protein with four transmembrane domains.
  • v-SNARE v-SNARE
  • t-SNAREs t-SNAREs
  • associated proteins involved Different isoforms of SNAREs and Rabs show distinct ceUular and subceUular distributions.
  • VAMP-1/synaptobrevin, membrane-anchored synaptosome-associated protein of 25 kDa (SNAP-25), syntaxin-1, Rab3A, Rabl5, and Rab23 are predominantly expressed in the brain and nervous system.
  • Different syntaxin, VAMP, and Rab proteins are associated with distinct subceUular compartments and their vesicular carriers.
  • Nuclear Transport Transport of proteins and RNA between the nucleus and the cytoplasm occurs through nuclear pore complexes (NPCs).
  • NPC-mediated transport occurs in both directions through the nuclear envelope.
  • AU nuclear proteins are imported from the cytoplasm, their site of synthesis.
  • tRNA and mRNA are exported from the nucleus, their site of synthesis, to the cytoplasm, their site of function.
  • Processing of smaU nuclear RNAs involves export into the cytoplasm, assembly with proteins and modifications such as hypermethylation to produce smaU nuclear ribonuclear proteins (snRNPs), and subsequent import of the snRNPs back into the nucleus.
  • the assembly of ribosomes requires the initial import of ribosomal proteins from the cytoplasm, their incorporation with RNA into ribosomal subunits, and export back to the cytoplasm. (G ⁇ rHch, D.
  • NLS nuclear locaHzation signals
  • NTF2 binds the GDP-bound form of Ran and to multiple proteins of the nuclear pore complex containing FXFG repeat motifs, such as p62. (Paschal, B. et al. (1997) J. Biol. Chem. 272:21534-21539; and Wong, D.H. et al. (1997) Mol. CeU Biol. 17:3755-3767). Some proteins are dissociated before nuclear mRNAs are transported across the NPC while others are dissociated shortly after nuclear mRNA transport across the NPC and are reimported into the nucleus. Disease Correlation
  • abnormal hormonal secretion is linked to disorders such as diabetes insipidus (vasopressin), hyper- and hypoglycemia (insulin, glucagon), Grave's disease and goiter (thyroid hormone), and Cushing's and Addison's diseases (adrenocorticotropic hormone,
  • cancer ceUs secrete excessive amounts of hormones or other biologicaHy active peptides.
  • Disorders related to excessive secretion of biologicaHy active peptides by tumor ceUs include fasting hypoglycemia due to increased insulin secretion from insuHnoma-islet ceU tumors; hypertension due to increased epinephrine and norepinephrine secreted from pheochromocytomas of the adrenal meduUa and sympathetic paragangHa; and carcinoid syndrome, which is characterized by abdominal cramps, diarrhea, and valvular heart disease caused by excessive amounts of vasoactive substances such as serotonin, bradykinin, histamine, prostaglandins, and polypeptide hormones, secreted from intestinal tumors.
  • BiologicaHy active peptides that are ectopicaUy synthesized in and secreted from tumor ceUs include ACTH and vasopressin (lung and pancreatic cancers); parathyroid hormone (lung and bladder cancers); calcitonin (lung and breast cancers); and thyroid-stimulating hormone (meduUary thyroid carcinoma).
  • Such peptides may be useful as diagnostic markers for tumorigenesis (Schwartz, M.Z. (1997) Semin. Pediatr. Surg. 3:141-146; and Said, S.I. and G.R Faloona (1975) N. Engl. J. Med. 293:155-160).
  • Defective nuclear transport may play a role in cancer.
  • the BRCA1 protein contains three potential NLSs which interact with importin alpha, and is transported into the nucleus by the importin/NPC pathway.
  • the BRCA1 protein In breast cancer ceUs the BRCA1 protein is aberrantly locaHzed in the cytoplasm.
  • the mislocation of the BRCA1 protein in breast cancer ceHs may be due to a defect in the NPC nuclear import pathway (Chen, CF. et al. (1996) J. Biol. Chem. 271:32863-32868). It has been suggested that in some breast cancers, the tumor-suppressing activity of p53 is inactivated by the sequestration of the protein in the cytoplasm, away from its site of action in the ceU nucleus.
  • Cytoplasmic wild-type p53 was also found inhuman cervical carcinoma ceH Hues. (MoU, U.M. et al. (1992) Proc. Natl. Acad. Sci. USA 89:7262-7266; and Liang, X.H. et al. (1993) Oncogene 8:2645-2652.) Environmental Responses
  • Organisms respond to the environment by a number of pathways.
  • Heat shock proteins including hsp 70, hsp60, hsp90, and hsp 40, assist organisms in coping with heat damage to ceUular proteins.
  • Aquaporins are channels that transport water and, in some cases, nonionic smaU solutes such as urea and glycerol. Water movement is important for a number of physiological processes including renal fluid filtration, aqueous humor generation in the eye, cerebrospinal fluid production in the brain, and appropriate hydration of the lung. Aquaporins are members of the major intrinsic protein (MB?) family of membrane transporters (King, L.S. and P. Agre (1996) Annu. Rev. Physiol. 58:619-648; Ishibashi, K. et al. (1997) J. Biol. Chem. 272:20782-20786).
  • MB major intrinsic protein
  • the metaUothioneins are a group of smaU (61 amino acids), cysteine-rich proteins that bind heavy metals such as cadmium, zinc, mercury, lead, and copper and are thought to play a role in metal detoxification or the metaboHsm and homeostasis of metals.
  • Arsenite-resistance proteins have been identified in hamsters that are resistant to toxic levels of arsenite (Rossman, T.G. et al. (1997) 0 Mutat. Res. 386:307-314).
  • Hght and odors by specific protein pathways. Proteins involved in Hght perception include rhodopsin, transducin, and cGMP phosphodiesterase. Proteins involved in odor perception include multiple olfactory receptors. Other proteins are important in human Orcadian rhythms and responses to wounds. 5 Immunity and Host Defense
  • the ceUular components of the humoral immune system include six different types of leukocytes: monocytes, lymphocytes, polymorphonuclear granulocytes (consisting of neutrophils, eosinophils, and basopbils) and plasma ceUs. AdditionaUy, fragments of megakaryocytes, a seventh type of white blood ceU in the bone marrow, occur in large numbers in the blood as platelets.
  • Leukocytes are formed from two stem ceH lineages in bone marrow.
  • the myeloid stem ceU 5 Hne produces granulocytes and monocytes and, the lymphoid stem ceH produces lymphocytes.
  • Lymphoid ceUs travel to the thymus, spleen and lymph nodes, where they mature and differentiate into lymphocytes.
  • Leukocytes are responsible for defending the body against invading pathogens.
  • Neutrophils and monocytes attack invading bacteria, viruses, and other pathogens and destroy them by phagocytosis.
  • Monocytes enter tissues and differentiate into macrophages which are extremely o phagocyti ⁇ Lymphocytes and plasma ceHs are a part of the immune system which recognizes specific foreign molecules and organisms and inactivates them, as weU as signals other ceHs to attack the invaders.
  • Granulocytes and monocytes are formed and stored in the bone marrow until needed. Megakaryocytes are produced in bone marrow, where they fragment into platelets and are released 5 into the bloodstream. The main function of platelets is to activate the blood clotting mechanism.
  • ymp ocytes an p asma ce s are pro uce n var ous ymp ogenous organs, nc u ng t e ymp nodes, spleen, thymus, and tonsils.
  • Basophils participate in the release of the chemicals involved in the inflammatory process.
  • the main function of basophils is secretion of these chemicals to such a degree that they have been referred to as "uniceUular endocrine glands.”
  • a distinct aspect of basopbiHc secretion is that the 0 contents of granules go directly into the extraceHular environment, not into vacuoles as occurs with neutrophils, eosinophils and monocytes.
  • Basophils have receptors for the Fc fragment of immunoglobulin E (IgE) that are not present on other leukocytes. Crosslinking of membrane IgE with anti-IgE or other Hgands triggers degranulation.
  • IgE immunoglobulin E
  • Eosinophils are bi- or multi-nucleated white blood ceHs which contain eosinophiHc granules. 5 Their plasma membrane is characterized by Ig receptors, particularly IgG and IgE. GeneraUy, eosinophils are stored in the bone marrow until recruited for use at a site of inflammation or invasion. They have specific functions in parasitic infections and aUergic reactions, and are thought to detoxify some of the substances released by mast ceUs and basophils which cause inflammation. AdditionaUy, they phagocytize antigen-antibody complexes and further help prevent spread of the inflammation. o Macrophages are monocytes that have left the blood stream to settle in tissue.
  • the mononuclear phagocyte system is comprised of precursor ceHs in the bone marrow, monocytes in circulation, and macrophages in tissues.
  • the system is capable of very fast and extensive phagocytosis.
  • a macrophage may phagocytize over 100 bacteria, digest them and extrude residues, and then survive 5 for many more months.
  • Macrophages are also capable of ingesting large particles, including red blood ceUs and malarial parasites. They increase several-fold in size and transform into macrophages that are characteristic of the tissue they have entered, surviving in tissues for several months.
  • Mononuclear phagocytes are essential in defending the body against invasion by foreign pathogens, particularly intraceUular microorganisms such as M. tuberculosis, Hsteria, leishmania and o toxoplasma. Macrophages can also control the growth of tumorous ceUs, via both phagocytosis and secretion of hydrolytic enzymes. Another important function of macrophages is that of processing antigen and presenting them in a biochemicaUy modified form to lymphocytes.
  • T ceHs T-lymphocytes
  • 5 T-lymphocytes originate in the bone marrow or Hver in fetuses.
  • Precursor ceUs migrate via the blood to the thymus, where they are processed to mature into T-lymphocytes. This processing is crucial because of positive and negative selection of T ceUs that wiH react with foreign antigen and not with self molecules.
  • T ceUs continuously circulate in the blood and secondary lymphoid tissues, such as lymph nodes, spleen, certain epitheHum-associated tissues in the 0 gastrointestinal tract, respiratory tract and skin.
  • T-lymphocytes are presented with the complementary antigen, they are stimulated to proHferate and release large numbers of activated T ceHs into the lymph system and the blood system. These activated T ceUs can survive and circulate for several days.
  • T memory ceUs are created, which remain in the lymphoid tissue for months or years. Upon subsequent exposure to that specific antigen, these memory ceHs wiU 5 respond more rapidly and with a stronger response than induced by the original antigen. This creates an "immunological memory” that can provide immunity for years.
  • T ceUs There are two major types of T ceUs: cytotoxic T ceUs destroy infected host ceUs, and helper T ceUs activate other white blood ceUs via chemical signals.
  • helper ceH T H 1
  • T H 2 activates macrophages to destroy ingested microorganisms
  • T H 2 stimulates the production of o antibodies by B ceUs.
  • T ceUs directly attack the infected target ceH.
  • peptides derived from viral proteins are generated by the proteasome. These peptides are transported into the ER by the transporter associated with antigen processing (TAP) (Pa er, E. and P. CressweU (1998) Annu. Rev. Immunol. 16:323-358).
  • TEP antigen processing
  • the peptides bind MHC I chains, and the 5 peptide/MHC I complex is transported to the ceU surface.
  • Receptors on the surface of T ceUs bind to antigen presented on ceH surface MHC molecules.
  • T ceUs Once activated by binding to antigen, T ceUs secrete ⁇ -interferon, a signal molecule that induces the expression of genes necessary for presenting viral (or other) antigens to cytotoxic T ceUs. Cytotoxic T ceUs kiU the infected ceUby stimulating programmed ceU death. o Helper T ceUs constitute up to 75% of the total T ceU population. They regulate the immune functions by producing a variety of lymphokines that act on other ceUs in the immune system and on bone marrow. Among these lymphokines are: interleukins-2,3,4,5,6; granulocyte-monocyte colony stimulating factor, and ⁇ -interferon.
  • Helper T ceUs are required for most B ceUs to respond to antigen.
  • an activated helper 5 ceU contacts a B ceU, its centiosome and Golgi apparatus become oriented toward the B ceU, aiding the directing of signal molecules, such as transmembrane-bound protein caUed CD40 Hgand, onto the B ceU surface to interact with the CD40 transmembrane protein.
  • Secreted signals also help B ceUs to proHferate and mature and, in some cases, to switch the class of antibody being produced.
  • B-lymphocytes produce antibodies which react with specific antigenic proteins presented by pathogens. Once activated, B ceUs become filled with extensive rough endoplasmic reticulum and are known as plasma ceUs. As with T ceUs, interaction of B ceUs with antigen stimulates proHferation of only those B ceUs which produce antibody specific to that antigen.
  • Antibodies or immunoglobulins (Ig), are the founding members of the Ig superfamily and the central components of the humoral immune response. Antibodies are either expressed on the surface of B ceUs or secreted by B ceUs into the circulation. Antibodies bind and neutraHze blood-borne foreign antigens.
  • the prototypical antibody is a tetramer consisting of two identical heavy polypeptide chains (H-chains) and two identical Hght polypeptide chains (L-chains) interHnked by disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules. Antibodies are classified based on their H-chain composition.
  • the five antibody classes, IgA, IgD, IgE, IgG and IgM, are defined by the ⁇ , ⁇ , ⁇ , ⁇ , and ⁇ H-chain types. There are two types of L-chains, and ⁇ , either of which may associate as a pair with any H-chain pair.
  • IgG the most common class of antibody found in the circulation, is tetrameric, while the other classes of antibodies are generaHy variants or multimers of this basic structure.
  • H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant region. Both H-chains and L-chains contain repeated Ig domains. For example, a typical H-chain contains four Ig domains, three of which occur within the constant region and one of which occurs within the variable region and contributes to the formation of the antigen recognition site. Likewise, a typical L-chain contains two Ig domains, one of which occurs within the constant region and one of which occurs within the variable region. In addition, H chains such as ⁇ have been shown to associate with other polypeptides during differentiation of the B ceU. Antibodies can be described in terms of their two main functional domains.
  • Antigen recognition is mediated by the Fab (antigen binding fragment) region of the antibody, while effector functions are mediated by the Fc (crystalHzable fragment) region.
  • Binding of antibody to an antigen such as a bacterium, triggers the destruction of the antigen by phagocytic white blood ceUs such as macrophages and neutrophils.
  • phagocytic white blood ceUs such as macrophages and neutrophils.
  • These ceHs express surface receptors that specificaUy bind to the antibody Fc region and aHow the phagocytic ceHs to engulf, ingest, and degrade the antibody-bound antigen.
  • the Fc receptors expressed by phagocytic ceHs are single-pass transmembrane glycoproteins of about 300 to 400 amino acids (Sears, D.W. et al. (1990) J. Immunol. 144:371-378).
  • a weU-known autoimmune disease is AIDS (Acquired Immunodeficiency Syndrome) where the number of helper T ceUs is depleted, leaving the patient susceptible to infection by microorganisms and parasites.
  • AIDS Abreliable Immunodeficiency Syndrome
  • Another widespread medical condition attributable to the immune system is that of aHergic reactions to certain antigens. AUergic reactions include: hay fever, asthma, anaphylaxis, and urticaria (hives).
  • Leukemias are an excess production of white blood ceUs, to the point where a major portion of the body's metaboHc resources are directed solely at proHferation of white blood ceUs, leaving other tissues to starve.
  • Leukopenia or agranulocytosis occurs when the bone marrow stops producing white blood ceUs. This leaves the body unprotected against foreign microorganisms, including those which normaHy inhabit skin, mucous membranes, and gastrointestinal tract. If aU white blood ceU production stops completely, infection wiH occur within two days and death may foUow only 1 to 4 days later.
  • Impaired phagocytosis occurs in several diseases, including monocytic leukemia, systemic lupus, and granulomatous disease. In such a situation, macrophages can phagocytize normaUy, but the enveloped organism is not killed. A defect in the plasma membrane enzyme which converts oxygen to lethaHy reactive forms results in abscess formation in Hver, lungs, spleen, lymph nodes, and beneath the skin.
  • EosinophiHa is an excess of eosinophils commonly observed in patients with aUergies (hay fever, asthma), aHergic reactions to drugs, rheumatoid arthritis, and cancers (Hodgkin's disease, lung, and Hver cancer) (Isselbacher, KJ. et al. (1994) Harrison's Principles of Internal Medicine, McGraw-HiU, Inc., New York NY).
  • the complement system serves as an effector system and is involved in infectious agent recognition. It can function as an independent immune network or in conjunction with other humoral immune responses.
  • the complement system is comprised of numerous plasma and membrane proteins that act in a cascade of reaction sequences whereby one component activates the next. The result is a rapid and ampHfied response to infection through either an inflammatory response or increased phagocytosis.
  • the complement system has more than 30 protein components which can be divided into functional groupings including modified serine proteases, membrane-binding proteins and regulators of complement activation. Activation occurs through two different pathways the classical and the alternative. Both pathways serve to destroy infectious agents through distinct triggering mechanisms that eventuaUy merge with the involvement of the component C3.
  • the classical pathway requires antibody binding to infectious agent antigens.
  • the antibodies serve to define the target and initiate the complement system cascade, culminating in the destruction of the infectious agent.
  • the complement can be seen as an effector arm of the humoral immune system.
  • the alternative pathway of the complement system does not require the presence of pre- 5 existing antibodies for targeting infectious agent destruction. Rather, this pathway, through low levels of an activated component, remains constantly primed and provides surveiUance in the non-immune host to enable targeting and destruction of infectious agents. In this case foreign material triggers the cascade, thereby facilitating phagocytosis or lysis (Paul, supra, pp.918-919).
  • Inflammatory 0 responses are divided into four categories on the basis of pathology and include aUergic inflammation, cytotoxic antibody mediated inflammation, immune complex mediated inflammation and monocyte mediated inflammation. Inflammation manifests as a combination of each of these forms with one predominating.
  • AUergic acute inflammation is observed in individuals wherein specific antigens stimulate IgE 5 antibody production.
  • Mast ceUs and basophils are subsequently activated by the attachment of antigen-IgE complexes, resulting in the release of cytoplasmic granule contents such as histamine.
  • the products of activated mast ceHs can increase vascular permeabiHty and constrict the smooth muscle of breathing passages, resulting in anaphylaxis or asthma.
  • Acute inflammation is also mediated by cytotoxic antibodies and can result in the destruction of tissue through the binding of complement- o fixing antibodies to ceHs.
  • the responsible antibodies are of the IgG or IgM types. Resultant clinical disorders include autoimmune hemolytic anemia and thrombocytopenia as associated with systemic lupus erythematosis.
  • Immune complex mediated acute inflammation involves the IgG or IgM antibody types which combine with antigen to activate the complement cascade.
  • immune complexes bind to 5 neutrophils and macrophages they activate the respiratory burst to form protein- and vessel-damaging agents such as hydrogen peroxide, hydroxyl radical, hypochlorous acid, and chloramines.
  • Clinical manifestations include rheumatoid arthritis and systemic lupus erythematosus.
  • InterceUular communication is essential for the growth and survival of multiceHular organisms, an n particu ar, or t e unct on o e en ocrne, nervous, an immune systems.
  • n a on, , interceUular communication is critical for developmental processes such as tissue construction and organogenesis, in which ceU proHferation, ceU differentiation, and morphogenesis must be spatiaUy and temporaUy regulated in a precise and coordinated manner.
  • CeUs communicate with one another through the secretion and uptake of diverse types of signaling molecules such as hormones, growth factors, neuropeptides, and cytokines. Hormones
  • Hormones are signaling molecules that coordinately regulate basic physiological processes from embryogenesis throughout adulthood. These processes include metaboHsm, respiration, reproduction, excretion, fetal tissue differentiation and organogenesis, growth and development, homeostasis, and the stress response. Hormonal secretions and the nervous system are tightly integrated and interdependent. Hormones are secreted by endocrine glands, primarily the hypothalamus and pituitary, the thyroid and parathyroid, the pancreas, the adrenal glands, and the ovaries and testes. The secretion of hormones into the circulation is tightly controHed. Hormones are often secreted in diurnal, pulsatile, and cycHc patterns.
  • Hormone secretion is regulated by perturbations in blood biochemistry, by other upstream-acting hormones, by neural impulses, and by negative feedback loops. Blood hormone concentrations are constantly monitored and adjusted to maintain optimal, steady-state levels. Once secreted, hormones act only on those target ceHs that express specific receptors.
  • hyposecretion often occurs when a hormone's gland of origin is damaged or otherwise impaired. Hypersecretion often results from the proHferation of tumors derived from hormone- secreting ceHs. Inappropriate hormone levels may also be caused by defects in regulatory feedback loops or in the processing of hormone precursors. Endocrine malfunction may also occur when the target ceH fails to respond to the hormone.
  • Hormones can be classified biochemicaUy as polypeptides, steroids, eicosanoids, or amines.
  • Polypeptides which include diverse hormones such as insulin and growth hormone, vary in size and function and are often synthesized as inactive precursors that are processed intraceUularly into mature, active forms.
  • Amines which include epinephrine and dopamine, are amino acid derivatives that function in neuroendocrine signaling.
  • Steroids which include the cholesterol-derived hormones estrogen and testosterone, function in sexual development and reproduction.
  • Eicosanoids which include prostaglandins and prostacycHns, are fatty acid derivatives that function in a variety of processes.
  • Hypothalamic hormones include thyrotropin-releasing hormone, gonadotropin-releasing hormone, somatostatin, growth-hormone releasing factor, corticotropin-releasing hormone, substance P, dopamine, and prolactin-releasing hormone. These hormones directly regulate the secretion of hormones from the anterior lobe of the pituitary.
  • Hormones secreted by the anterior pituitary include adrenocorticotropic hormone (ACTTT), melanocyte-stimulating hormone, somatotropic hormones such as growth hormone and prolactin, glycoprotein hormones such as thyroid-stimulating hormone, luteinizing hormone (LH), and foUicle-stimulating hormone (FSH), ⁇ -Hpotropin, and ⁇ -endorphins. These hormones regulate hormonal secretions from the thyroid, pancreas, and adrenal glands, and act directly on the reproductive organs to stimulate ovulation and spermatogenesis.
  • the posterior pituitary synthesizes and secretes antidiuretic hormone (ADH, vasopressin) and oxytocin.
  • ADH antidiuretic hormone
  • vasopressin vasopressin
  • disorders of the hypothalamus and pituitary often result from lesions such as primary brain tumors, adenomas, infarction associated with pregnancy, hypophysectomy, aneurysms, vascular malformations, thrombosis, infections, immunological disorders, and compHcations due to head trauma. Such disorders have profound effects on the function of other endocrine glands.
  • Disorders associated with hypopituitarism include hypogonadism, Sheehan syndrome, diabetes insipidus, KaUman's disease, Hand-SchuHer-Christian disease, Letterer-Siwe disease, sarcoidosis, empty seUa syndrome, and dwarfism.
  • Disorders associated with hyperpituitarism include acromegaly, giantism, and syndrome of inappropriate ADH secretion (SIADH), often caused by benign adenomas.
  • SIADH inappropriate ADH secretion
  • Thyroid hormones secreted by the thyroid and parathyroid primarily control metaboHc rates and the regulation of serum calcium levels, respectively.
  • Thyroid hormones include calcitonin, somatostatin, and thyroid hormone.
  • the parathyroid secretes parathyroid hormone.
  • Disorders associated with hypothyroidism include goiter, myxedema, acute thyroiditis associated with bacterial infection, subacute thyroiditis associated with viral infection, autoimmune thyroiditis (Hashimoto's disease), and cretinism.
  • Disorders associated with hyperthyroidism include thyrotoxicosis and its various forms, Grave's disease, pret ⁇ bial myxedema, toxic multinodular goiter, thyroid carcinoma, and Plummer's disease.
  • Disorders associated with hyperparafhyroidism include Conn disease (chronic hypercalemia) leading to bone resorption and parathyroid hyperplasia.
  • Pancreatic hormones secreted by the pancreas regulate blood glucose levels by modulating the rates of carbohydrate, fat, and protein metaboHsm.
  • Pancreatic hormones include insulin, glucagon, amylin, ⁇ - aminobutyric acid, gastrin, somatostatin, and pancreatic polypeptide.
  • the principal disorder associated with pancreatic dysfunction is diabetes meUitus caused by insufficient insulin activity. Diabetes meUitus is generaUy classified as either Type I (insulin-dependent, juvenile diabetes) or Type H (non- insulin-dependent, adult diabetes). The treatment of both forms by insulin replacement therapy is weU known.
  • Diabetes meUitus often leads to acute compHcations such as hypoglycemia (insulin shock), coma, diabetic ketoacidosis, lactic acidosis, and chronic compHcations leading to disorders of the eye, kidney, skin, bone, joint, cardiovascular system, nervous system, and to decreased resistance to infection.
  • acute compHcations such as hypoglycemia (insulin shock), coma, diabetic ketoacidosis, lactic acidosis, and chronic compHcations leading to disorders of the eye, kidney, skin, bone, joint, cardiovascular system, nervous system, and to decreased resistance to infection.
  • Growth factors are secreted proteins that mediate interceUular communication. Unlike hormones, which travel great distances via the circulatory system, most growth factors are primarily local mediators that act on neighboring ceUs. Most growth factors contain a hydrophobic N-terminal signal peptide sequence which directs the growth factor into the secretory pathway. Most growth factors also undergo post-translational modifications within the secretory pathway. These modifications can include proteolysis, glycosylation, phosphorylation, and intramolecular disulfide bond formation. Once secreted, growth factors bind to specific receptors on the surfaces of neighboring target ceUs, and the bound receptors trigger intraceUular signal transduction pathways. These signal transduction pathways eHcit specific ceUular responses in the target ceHs. These responses can include the modulation of gene expression and the stimulation or inhibition of ceU division, ceU differentiation, and ceU motiHty.
  • the broadest class includes the large polypeptide growth factors, which are wide-ranging in their effects. These factors include epidermal growth factor (EGF), fibroblast growth factor (FGF), transforming growth factor- ⁇ (TGF- ⁇ ), insulin-like growth factor (IGF), nerve growth factor (NGF), and platelet-derived growth factor (PDGF), each defining a family of numerous related factors.
  • the large polypeptide growth factors act as mitogens on diverse ceU types to stimulate wound healing, bone synthesis and remodeling, extraceUular matrix synthesis, and proHferation of epitheHal, epidermal, and connective tissues.
  • TGF- ⁇ , EGF, and FGF famiHes also function as inductive signals in the differentiation of embryonic tissue. NGF functions specificaUy as a neurotrophic factor, promoting neuronal growth and differentiation.
  • Another class of growth factors includes the hematopoietic growth factors, which are narrow in their target specificity. These factors stimulate the proHferation and differentiation of blood ceHs such as B-lymphocytes, T-lymphocytes, erythrocytes, platelets, eosinophils, basophils, neutrophils, macrophages, and their stem ceH precursors. These factors include the colony-stimulating factors (U-
  • cytokines are speciaHzed hematopoietic factors secreted by ceUs of the immune system and are discussed in detail below.
  • Growth factors play critical roles in neoplastic transformation of ceUs in vitro and in tumor progression in vivo. Overexpression of the large polypeptide growth factors promotes the proHferation and transformation of ceUs in culture. Inappropriate expression of these growth factors by tumor ceUs in vivo may contribute to tumor vascularization and metastasis. Inappropriate activity of hematopoietic growth factors can result in anemias, leukemias, and lymphomas. Moreover, growth factors are both structuraHy and functionaUy related to oncoproteins, the potentiaUy cancer-causing products of proto- oncogenes.
  • FGF and PDGF family members are themselves homologous to oncoproteins, whereas receptors for some members of the EGF, NGF, and FGF famiHes are encoded by proto- oncogenes. Growth factors also affect the transcriptional regulation of both proto-oncogenes and oncosuppressor genes (Pimentel, E. (1994) Handbook of Growth Factors, CRC Press, Ann Arbor MI; McKay, I. and I. Leigh, eds. (1993) Growth Factors: A Practical Approach, Oxford University Press, New York NY; Habenicht, A., ed. (1990) Growth Factors, Differentiation Factors, and Cytokines, Springer- Verlag, New York NY).
  • Neuropeptides and vasomediators comprise a family of smaU peptide factors, typicaHy of 20 amino acids or less. These factors generaUy function in neuronal excitation and inhibition of vasoconstriction/vasodilation, muscle contraction, and hormonal secretions from the brain and other endocrine tissues.
  • neuropeptides and neuropeptide hormones such as bombesin, neuropeptide Y, neurotensin, neuromedin N, melanocortins, opioids, galanin, somatostatin, tachykinins, urotensin H and related peptides involved in smooth muscle stimulation, vasopressin, vasoactive intestinal peptide, and circulatory system-borne signaling molecules such as angiotensin, complement, calcitonin, endotheHns, formyl-methionyl peptides, glucagon, cholecystokinin, gastrin, and many of the peptide hormones discussed above.
  • neuropeptide hormones such as bombesin, neuropeptide Y, neurotensin, neuromedin N, melanocortins, opioids, galanin, somatostatin, tachykinins, urotensin H and related peptides involved in smooth muscle stimulation, vaso
  • NP/VMs can transduce signals directly, modulate the activity or release of other neurotransmitters and hormones, and act as catalytic enzymes in signaling cascades.
  • the effects of NP/VMs range from extremely brief to long-lasting. (Reviewed in Martin, CR. et al. (1985) Endocrine Physiology, Oxford University Press, New York NY, pp. 57- 62.) Cytokines
  • Cytokines comprise a family of signaling molecules that modulate the immune system and the inflammatory response. Cytokines are usuaHy secreted by leukocytes, or white blood ceUs, in response to injury or infection. Cytokines function as growth and differentiation factors that act 5 primarily on ceUs of the immune system such as B- and T-lymphocytes, monocytes, macrophages, and granulocytes. Like other signaling molecules, cytokines bind to specific plasma membrane receptors and trigger intraceUular signal transduction pathways which alter gene expression patterns. There is considerable potential for the use of cytokines in the treatment of inflammation and immune system disorders. 0 Cytokine structure and function have been extensively characterized in vitro.
  • cytokine subfamiHes include the interferons (TFN- ⁇ , - ⁇ , and - ⁇ ), the interleukins (IL1-IL13), the tumor necrosis factors (TNF- ⁇ and - ⁇ ), and the chemokines.
  • TNF- ⁇ and - ⁇ tumor necrosis factors
  • chemokines chemokines.
  • Many cytokines have been produced using recombinant DNA techniques, and the activities of 5 individual cytokines have been determined in vitro. These activities include regulation of leukocyte proHferation, differentiation, and motiHty.
  • cytokine activity in vitro may not reflect the fuU scope of that cytokine' s activity in vivo.
  • Cytokines are not expressed individually in vivo but are instead expressed in combination with a multitude of other cytokines when the organism is chaUenged with a stimulus. o Together, these cytokines coUectively modulate the immune response in a manner appropriate for that particular stimulus. Therefore, the physiological activity of a cytokine is determined by the stimulus itself and by complex interactive networks among co-expressed cytokines which may demonstrate both synergistic and antagonistic relationships.
  • Chemokines comprise a cytokine subfamily with over 30 members. (Reviewed in WeUs, T. 5 N.C. and M.C Peitsch (1997) J. Leukoc. Biol. 61:545-550.) Chemokines were initiaUy identified as chemotactic proteins that recruit monocytes and macrophages to sites of inflammation. Recent evidence indicates that chemokines may also play key roles in hematopoiesis and HTV-l infection. Chemokines are smaU proteins which range from about 6-15 kilodaltons in molecular weight. Chemokines are further classified as C, CC, CXC, or CX 3 C based on the number and position of o critical cysteine residues.
  • the CC chemokines for example, each contain a conserved motif consisting of two consecutive cysteines foUowed by two additional cysteines which occur downstream at 24- and 16-residue intervals, respectively (ExPASy PROSITE database, documents PS00472 and PDOC00434).
  • the presence and spacing of these four cysteine residues are highly conserved, whereas the intervening residues diverge significantly.
  • a conserved tyrosine located about 5 15 residues downstream of the cysteine doublet seems to be important for chemotactic activity.
  • Most of the human genes encoding CC chemokines are clustered on chromosome 17, although there are a few examples of CC chemokine genes that map elsewhere.
  • chemokines include lymphotactin (C chemokine); macrophage chemotactic and activating factor (MCAF/MCP-1; CC chemokine); platelet factor 4 and IL-8 (CXC chemokines); and fractalkine and neurotractin (CX 3 C chemokines).
  • receptor describes proteins that specificaUy recognize other molecules.
  • the category is broad and includes proteins with a variety of functions.
  • the bulk of receptors are ceU surface proteins which bind extraceUular Hgands and produce ceUular responses in the areas of growth, differentiation, endocytosis, and immune response.
  • Other receptors faciHtate the selective transport of proteins out of the endoplasmic reticulum and locaHze enzymes to particular locations in the ceU.
  • the term may also be appHed to proteins which act as receptors for Hgands with known or unknown chemical composition and which interact with other ceUular components.
  • the steroid hormone receptors bind to and regulate transcription of DNA.
  • ceU proHferation, differentiation, and migration are important for the formation and function of tissues. Regulatory proteins such as growth factors coordinately control these ceUular processes and act as mediators in ceU-ceU signaling pathways. Growth factors are secreted proteins that bind to specific ceU-surface receptors on target ceUs. The bound receptors trigger intraceUular signal transduction pathways which activate various downstream effectors that regulate gene expression, ceU division, ceH differentiation, ceU motiHty, and other ceUular processes.
  • CeU surface receptors are typicaUy integral plasma membrane proteins. These receptors recognize hormones such as catecholamines; peptide hormones; growth and differentiation factors; smaU peptide factors such as thyrotropin-releasing hormone; galanin, somatostatin, and tachykinins; and circulatory system-borne signaling molecules. CeU surface receptors on immune system ceHs recognize antigens, antibodies, and major histocompatibiHty complex (MHC)-bound peptides. Other ceU surface receptors bind Hgands to be internaHzed by the ceU.
  • MHC major histocompatibiHty complex
  • LDL low density Hpoproteins
  • transferrin glucose- or mannose-terminal glycoproteins
  • galactose-terminal glycoproteins galactose-terminal glycoproteins
  • immunoglobulins phosphoviteHogenins
  • fibrin proteinase-inhibitor complexes
  • plasminogen activators plasminogen activators
  • thrombospondin Receptor Protein Kinases
  • growth factor receptors including receptors for epidermal growth factor, platelet-derived growth factor, fibroblast growth factor, as weU as the growth modulator ⁇ -thrombin, conta n ntr ns c prote n nase ac v t es. en growt factor n s to t e receptor, t t ggers t e autophosphorylation of a serine, threonine, or tyrosine residue on the receptor. These phosphorylated sites are recognition sites for the binding of other cytoplasmic signaling proteins. These proteins participate in signaling pathways that eventuaUy link the initial receptor activation at the ceU surface to the activation of a specific intraceUular target molecule.
  • SH2 domains and SH3 domains are found in phosphoHpase C- ⁇ , PI-3-K p85 regulatory subunit, Ras-GTPase activating protein, and pp60°- src (Lowenstein, E . et al. (1992) CeU 70:431-442).
  • the cytokine family of receptors share a different common binding domain and include transmembrane receptors for growth hormone (GH), interleukins, erythropoietin, and prolactin.
  • receptors and second messenger-binding proteins have intrinsic serine/threonine protein kinase activity. These include activin/TGF- ⁇ /BMP-superfamily receptors, calcium- and diacylglycerol-activated/phosphoHpid-dependant protein kinase (PK-C), and RNA-dependant protein kinase (PK-R).
  • PKI calcium- and diacylglycerol-activated/phosphoHpid-dependant protein kinase
  • PK-R RNA-dependant protein kinase
  • serine/threonine protein kinases including nematode Twitchin, have fibronectin-like, immunoglobulin C2-like domains.
  • G-protein coupled receptors are integral membrane proteins characterized by the presence of seven hydrophobic transmembrane domains which span the plasma membrane and form a bundle of antiparaUel alpha ( ⁇ ) heHces. These proteins range in size from under 400 to over 1000 amino acids (Strosberg, A.D. (1991) Eur. J. Biochem. 196:1-10; CoughHn, S.R. (1994) Curr. Opin. CeU Biol. 6:191-197).
  • the ammo-terminus of the GPCR is extraceUular, of variable length and often glycosylated; the carboxy-terminus is cytoplasmic and generaUy phosphorylated.
  • ExtraceUular loops of the GPCR alternate with intraceUular loops and link the transmembrane domains.
  • the most conserved domains of GPCRs are the transmembrane domains and the first two cytoplasmic loops.
  • the transmembrane domains account for structural and functional features of the receptor. In most cases, the bundle of ⁇ heHces forms a binding pocket.
  • the extraceUular N-terminal segment or one or more of the three extraceUular loops may also participate in Hgand binding.
  • Ligand binding activates the receptor by inducing a conformational change in intraceUular portions of the receptor.
  • the activated receptor interacts with an intraceUular heterotrimeric guanine nucleotide binding (G) protein complex which mediates further intraceUular signaling activities, generaUy the production of second messengers such as cycHc AMP (cAMP), phosphoHpase C, inositol triphosphate, or interactions with ion channel proteins (Baldwin, J.M. (1994) Curr. Opin. CeH Biol. 6:180-190).
  • G guanine nucleotide binding
  • GPCRs include those for acetylcholine, adenosine, epinephrine and norepinephrine, bombesin, bradykinin, chemokines, dopamine, endotheHn, ⁇ -aminobutyric acid (GABA), foUicle-stimulating hormone (FSH), glutamate, gonadotropin-releasing hormone (GnRH), hepatocyte growth factor,
  • GPCR mutations which may cause loss of function or constitutive activation, have been associated with numerous human diseases (CoughHn, supra). For instance, retinitis pigmentosa may arise from mutations in the rhodopsin gene.
  • Rhodopsin is the retinal photoreceptor which is located within the discs of the eye rod ceU.
  • Parma, J. et al. (1993, Nature 365:649-651) report that somatic activating mutations in the thyrotropin receptor cause hyperfunctioning thyroid adenomas and suggest that certain GPCRs susceptible to constitutive activation may behave as protooncogenes.
  • Nuclear receptors bind smaU molecules such as hormones or second messengers, leading to increased receptor-binding affinity to specific chromosomal DNA elements. In addition the affinity for other nuclear proteins may also be altered. Such binding and protein-protein interactions may regulate and modulate gene expression. Examples of such receptors include the steroid hormone receptors family, the retinoic acid receptors family, and die thyroid hormone receptors family. Ligand-Gated Receptor Ion Channels
  • Ligand-gated receptor ion channels faU into two categories.
  • the first category extraceUular Hgand-gated receptor ion channels (ELGs), rapidly transduce neurotransmitter-binding events into electrical signals, such as fast synaptic neurotransmission. ELG function is regulated by posttranslational modification.
  • the second category intraceUular Hgand-gated receptor ion channels (ILGs), are activated by many intraceUular second messengers and do not require post-translational modifications) to effect a channel-opening response.
  • ELGs depolarize excitable ceUs to the threshold of action potential generation. In non- excitable ceHs, ELGs permit a limited calcium ion-influx during the presence of agonist.
  • ELGs include channels directly gated by neurotransmitters such as acetylcholine, L-glutamate, glycine, ATP, serotonin, GABA, and histamine.
  • ELG genes encode proteins having strong structural and functional similarities. ILGs are encoded by distinct and unrelated gene famiHes and include receptors for cAMP, cGMP, calcium ions, ATP, and metaboHtes of arachidonic acid. Macrophage Scavenger Receptors
  • Macrophage scavenger receptors with broad Hgand specificity may participate in the binding of low density Hpoproteins (LDL) and foreign antigens.
  • Scavenger receptors types I and H are trimeric membrane proteins with each subunit containing a smaU N-terminal intraceUular domain, a transmembrane domain, a large extraceUular domain, and a C-terminal cysteine-rich domain.
  • the extraceUular domain contains a short spacer domain, an ⁇ -heHcal coiled-coil domain, and a triple heHcal coHagenous domain.
  • Hgands include chemicaUy modified Hpoproteins and albumin, polyribonucleotides, polysaccharides, phosphoHpids, and asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sci. USA 87:9133-9137; Elomaa, O. et al. (1995) CeU 80:603-609).
  • the scavenger receptors are thought to play a key role in atherogenesis by 5 mediating uptake of modified LDL in arterial waUs, and in host defense by binding bacterial endotoxins, bacteria, and protozoa.
  • T-CeU Receptors T-CeU Receptors
  • T ceUs play a dual role in the immune system as effectors and regulators, coupling antigen recognition with the transmission of signals that induce ceH death in infected ceUs and stimulate 0 proHferation of other immune ceUs.
  • T ceU receptor TCR
  • MHC major histocompatibility molecule
  • Both TCR subunits have an extraceUular domain containing both variable and constant regions, a transmembrane domain that traverses the membrane once, and a short intraceUular domain (Saito, H. et al. (1984) Nature 309:757-762).
  • the genes for the TCR subunits are constructed through somatic rearrangement of different gene segments. Interaction of antigen in the proper MHC context with the TCR initiates signaling cascades that induce the proHferation, maturation, and function of 0 ceUular components of the immune system (Weiss, A. (1991) Annu. Rev. Genet. 25:487-510).
  • TCR genes and alterations in TCR expression have been noted in lymphomas, leukemias, autoimmune disorders, and immunodeficiency disorders (Aisenberg, A.C. et al. (1985) N. Engl. J. Med. 313:529-533; Weiss, supra).
  • IntraceUular signaling is the general process by which ceUs respond to extraceUular signals (hormones, neurotransmitters, growth and differentiation factors, etc.) through a cascade of biochemical reactions that begins with the binding of a signaling molecule to a ceU membrane receptor and ends with the activation of an intraceUular target molecule.
  • Intermediate steps in the process o involve the activation of various cytoplasmic proteins by phosphorylation via protein kinases, and their deactivation by protein phosphatases, and the eventual translocation of some of these activated proteins to the ceU nucleus where the transcription of specific genes is triggered.
  • the intraceUular signaling process regulates aH types of ceU functions including ceU proHferation, ceU differentiation, and gene transcription, and involves a diversity of molecules including protein kinases and phosphatases, 5 and second messenger molecules, such as cycHc nucleotides, calcium-calmodulin, inositol, and various iiniogens, mat reguiaie protem pnospnoryiauon.
  • Protein kinases and phosphatases play a key role in the intraceUular signaling process by controlling the phosphorylation and activation of various signaling proteins.
  • the high energy phosphate for this reaction is generaUy transferred from the adenosine triphosphate molecule (ATP) to a particular protein by a protein kinase and removed from that protein by a protein phosphatase.
  • ATP adenosine triphosphate molecule
  • Protein kinases are roughly divided into two groups: those that phosphorylate tyrosine residues (protein tyrosine kinases, PTK) and those that phosphorylate serine or threonine residues (serhie/threonine kinases, STK).
  • a few protein kinases have dual specificity for serine/threonine and tyrosine residues. Almost aH kinases contain a conserved 250-300 amino acid catalytic domain containing specific residues and sequence motifs characteristic of the kinase family (Hardie, G. and S. Hanks (1995) The Protein Kinase Facts Books, Vol 1:7-20, Academic Press, San Diego CA).
  • STKs include the second messenger dependent protein kinases such as the cycHc-AMP dependent protein kinases (PICA), involved in mediating hormone-induced ceUular responses; calcium-calmodulin (CaM) dependent protein kinases, involved in regulation of smooth muscle contraction, glycogen breakdown, and neurotransmission; and the mitogen-activated protein kinases (MAP) which mediate signal transduction from the ceU surface to the nucleus via phosphorylation cascades.
  • PICA cycHc-AMP dependent protein kinases
  • CaM calcium-calmodulin dependent protein kinases
  • MAP mitogen-activated protein kinases
  • PTKs are divided into transmembrane, receptor PTKs and nontransmembrane, non-receptor PTKs.
  • Transmembrane PTKs are receptors for most growth factors.
  • Non-receptor PTKs lack transmembrane regions and, instead, form complexes with the intraceUular regions of ceU surface receptors.
  • Receptors that function through non-receptor PTKs include those for cytokines and hormones (growth hormone and prolactin) and antigen-specific receptors on T and B lymphocytes.
  • HPK histidine protein kinase family
  • HPKs bear Httle homology with mammaHan STKs or PTKs but have distinctive sequence motifs of their own (Davie, J.R. et al. (1995) J. Biol. Chem. 270:19861-19867).
  • a Mstidine residue in the N-terminal half of the molecule (region I) is an autophosphorylation site.
  • Three additional motifs located in the C-terminal half of the molecule include an invariant asparagine residue in region H and two glycine-rich loops characteristic of nucleotide binding domains in regions HI and IV. Recently a branched chain alpha-ketoacid dehydrogenase kinase has been found with characteristics of HPK in rat (Davie, supra).
  • the two principal categories of protein phosphatases are the protein (serine/threonine) phosphatases (PPs) and the protein tyrosine phosphatases (PTPs).
  • PPs dephosphorylate phosphoserine/threonine residues and are important regulators of many cAMP-mediated hormone responses (Cohen, P. (1989) Annu. Rev. Biochem. 58:453-508).
  • PTPs reverse the effects of protein tyrosine kinases and play a significant role in ceU cycle and ceU signaling processes (Charbonneau, supra).
  • PTPs may prevent or reverse ceU transformation and the growth of various cancers by controlling the levels of tyrosine phosphorylation in ceUs. This hypothesis is supported by studies showing that overexpression of PTPs can suppress transformation in ceUs, and that specific inhibition of PTPs can enhance ceU transformation (Charbonneau, supra). PhosphoHpid and Inositol-Phosphate Signaling
  • Inositol phosphoHpids are involved in an intraceUular signaling pathway that begins with binding of a signaling molecule to a G-protein linked receptor in the plasma membrane. This leads to the phosphorylation of phosphatidylinositol (PI) residues on the inner side of the plasma membrane to the biphosphate state (PIP 2 ) by inositol kinases. Simultaneously, the G-protein Hnked receptor binding stimulates a trimeric G-protein which in turn activates a phosphoinositide-specific phosphoHpase C- ⁇ .
  • PI phosphatidylinositol
  • IP 3 inositol triphosphate
  • ER endoplasmic reticulum
  • diacylglycerol helps activate protein kinase C, an STK that phosphorylates selected proteins in the target ceU.
  • the calcium response initiated by IP 3 is terminated by the dephosphorylatiori of IP 3 by specific inositol phosphatases.
  • CeUular responses that are mediated by this pathway are glycogen breakdown in the Hver in response to vasopressin, smooth muscle contraction in response to acetylcholine, and thrombin-induced platelet aggregation. CycHc Nucleotide Signaling
  • CycHc nucleotides function as intraceUular second messengers to transduce a variety of extraceUular signals including hormones, Hght, and neurotransmitters.
  • cycHc-AMP dependent protein kinases PKA
  • PKA cycHc-AMP dependent protein kinases
  • adenylyl cyclase which synthesizes cAMP from AMP, is activated to increase cAMP levels in muscle by binding of adrenaline to ⁇ -andrenergic receptors, while activation of guanylate cyclase and increased cGMP levels in photoreceptors leads to reopening of the Ca 2+ -specific channels and recovery of the dark state in the eye.
  • PDEs hydrolysis of cycHc nucleotides by cAMP and cGMP-specific phosphodiesterases (PDEs) produces the opposite of these and other effects mediated by increased cycHc nucleotide levels.
  • PDEs appear to be particularly important in the regulation of cycHc nucleotides, considering the diversity found in this family of proteins.
  • At least seven famiHes of mammaHan PDEs (PDEl-7) have been identified based on substrate specificity and affinity, sensitivity to cofactors, and sensitivity to inhibitory drugs (Beavo, J.A. (1995) Physiological Reviews 75:725-748).
  • PDE inhibitors have been found to be particularly useful in treating various clinical disorders.
  • RoHpram a specific inhibitor of PDE4
  • TheophyHine is a nonspecific PDE inhibitor used in the treatment of bronchial asthma and other respiratory diseases (Banner, K.H. and CP. Page (1995) Eur. Respir. J. 8:996-1000).
  • G-proteins are critical mediators of signal transduction between a particular class of extraceUular receptors, the G-protein coupled receptors (GPCR), and intraceUular second messengers such as cAMP and Ca + .
  • G-proteins are linked to the cytosoHc side of a GPCR such that activation of the GPCR by Hgand binding stimulates binding of the G-protein to GTP, inducing an "active" state in the G-protein.
  • the G-protein acts as a signal to trigger other events in the ceU such as the increase of cAMP levels or the release of Ca 2+ into the cytosol from the ER, which, in turn, regulate phosphorylation and activation of other intraceUular proteins. Recycling of the G-protein to the inactive state involves hydrolysis of the bound GTP to GDP by a GTPase activity in the G-protein. (See Alberts, B. et al.
  • G-proteins consisting of three different subunits, and monomeric, low molecular weight (LMW), G-proteins consisting of a single polypeptide chain.
  • LMW low molecular weight
  • the three polypeptide subunits of heterotrimeric G-proteins are the , ⁇ , and ⁇ subunits.
  • the subunit binds and hydrolyzes GTP.
  • the ⁇ and ⁇ subunits form a tight complex that anchors the protein to the inner side of the plasma membrane.
  • the ⁇ subunits also known as G- ⁇ proteins or ⁇ transducins, contain seven tandem repeats of the WD-repeat sequence motif, a motif found in many proteins with regulatory functions. Mutations and variant expression of ⁇ transducin proteins are Junked with various disorders (JNeer, EJ. et al. (1994) Nature 371:297-300; Margottin, F. et al. (1998)
  • LMW GTP-proteins are GTPases which regulate ceU growth, ceU cycle control, protein secretion, and intraceUular vesicle interaction. They consist of single polypeptides which, like the 5 subunit of the heterotrimeric G-proteins, are able to bind and hydrolyze GTP, thus cycling between an inactive and an active state. At least sixty members of the LMW G-protein superfamily have been identified and are currently grouped into the six subfamilies of ras, rho, arf, sari, ran, and rab. Activated ras genes were initiaUy found in human cancers, and subsequent studies confirmed that ras function is critical in determining whether ceUs continue to grow or become differentiated. Other o members of the LMW G-protein superfamily have roles in signal transduction that vary with the function of the activated genes and the locations of the G-proteins.
  • Guanine nucleotide exchange factors regulate the activities of LMW G-proteins by determining whether GTP or GDP is bound.
  • GTPase-activating protein GAP
  • GTP-ras GTPase-activating protein
  • GNRP guanine nucleotide releasing protein
  • RGS G-protein signaling
  • Ca +2 is another second messenger molecule that is even more widely used as an intraceUular mediator than cAMP.
  • Ca 2+ directly activates regulatory enzymes, such as protein kinase C, which trigger signal transduction pathways.
  • Ca 2+ also binds to specific Ca + -binding proteins (CBPs) such as o calmoduHn (CaM) which then activate multiple target proteins in the ceU including enzymes, membrane transport pumps, and ion channels.
  • CBPs Ca + -binding proteins
  • CaM interactions are involved in a multitude of ceUular processes including, but not limited to, gene regulation, DNA synthesis, ceU cycle progression, mitosis, cytokinesis, cytoskeletal organization, muscle contraction, signal transduction, ion homeostasis, exocytosis, and metaboHc regulation (CeHo, M.R. et al. (1996) Guidebook to Calcium-binding Proteins, Oxford University Fress, Oxford, UK, pp. 15-2U).
  • Calsequestrin is one such CBP that is expressed in isoforms specific to cardiac muscle and skeletal muscle. It is suggested that calsequestrin binds Ca 2+ in a rapidly exchangeable state that is released during Ca 2+ -signaling conditions (CeHo, M.R. et al. (1996) Guidebook to 5 Calcium-binding Proteins, Oxford University Press, New York NY, pp. 222-224). Cyclins
  • CeH division is the fundamental process by which aH Hving things grow and reproduce. In most organisms, the ceU cycle consists of three principle steps; interphase, mitosis, and cytokinesis. Interphase, involves preparations for ceU division, repHcation of the DNA and production of essential 0 proteins. In mitosis, the nuclear material is divided and separates to opposite sides of the ceU. Cytokinesis is the final division and fission of the ceU cytoplasm to produce the daughter ceUs.
  • CeU cyclin-dependent protein kinases
  • cyclin B which controls entry of the ceU into mitosis
  • Gl cyclin which controls events that drive the ceU out of mitosis.
  • Ceretain proteins in intraceUular signaling pathways serve to link or cluster other proteins o involved in the signaling cascade.
  • a conserved protein domain caUed the PDZ domain has been identified in various membrane-associated signaling proteins. This domain has been impHcated in receptor and ion channel clustering and in the targeting of multiprotein signaling complexes to speciaHzed functional regions of the cytosoHc face of the plasma membrane. (For a review of PDZ domain-containing proteins, see Ponting, C.P. et al.
  • PDZ domains are found in the eukaryotic MAGUK (membrane-associated guanylate kinase) protein family, members of which bind to the intraceUular domains of receptors and channels.
  • PDZ domains are also found in diverse membrane-locaHzed proteins such as protein tyrosine phosphatases, serine/threonine kinases, G-protein cofactors, and synapse-associated proteins such as syntrophins and neuronal nitric oxide synthase (nNOS).
  • GeneraUy about one to three PDZ o domains are found in a given protein, although up to nine PDZ domains have been identified in a single protein.
  • the plasma membrane acts as a barrier to most molecules. Transport between the cytoplasm 5 and the extraceUular environment, and between the cytoplasm and lumenal spaces of ceUular organeUes requires specific transport proteins. Each transport protein carries a particular class of molecule, such as ions, sugars, or amino acids, and often is specific to a certain molecular species of the class. A variety of human inherited diseases are caused by a mutation in a transport protein. For example, cystinuria is an inherited disease that results from the inabiHty to transport cystine, the disulfide-linked dimer of cysteine, from the urine into the blood. Accumulation of cystine in the urine leads to the formation of cystine stones in the kidneys.
  • Transport proteins are multi-pass transmembrane proteins, which either actively transport molecules across the membrane or passively aHow them to cross. Active transport involves directional pumping of a solute across the membrane, usuaUy against an electrochemical gradient. Active transport is tightly coupled to a source of metaboHc energy, such as ATP hydrolysis or an electrochemicaUy favorable ion gradient. Passive transport involves the movement of a solute down its electrochemical gradient. Transport proteins can be further classified as either carrier proteins or channel proteins. Carrier proteins, which can function in active or passive transport, bind to a specific solute to be transported and undergo a conformational change which transfers the bound solute across the membrane. Channel proteins, which only function in passive transport, form hydrophiHc pores across the membrane. When the pores open, specific solutes, such as inorganic ions, pass through the membrane and down the electrochemical gradient of the solute.
  • Carrier proteins which transport a single solute from one side of the membrane to the other are caUed uniporters.
  • coupled transporters link the transfer of one solute with simultaneous or sequential transfer of a second solute, either in the same direction (symport) or in the opposite direction (antiport).
  • intestinal and kidney epitheHum contains a variety of symporter systems driven by the sodium gradient that exists across the plasma membrane. Sodium moves into the ceU down its electrochemical gradient and brings the solute into the ceU with it. The sodium gradient that provides the driving force for solute uptake is maintained by the ubiquitous Na + /K + ATPase.
  • Sodium-coupled transporters include the mammaHan glucose transporter (SGLTl), iodide transporter (NIS), and multivitamin transporter (SMVT).
  • SGLTl mammaHan glucose transporter
  • NIS iodide transporter
  • SMVT multivitamin transporter
  • AH three transporters have twelve putative transmembrane segments, extraceUular glycosylation sites, and cytoplasmicaHy-oriented N- and C-termini.
  • NIS plays a crucial role in the evaluation, diagnosis, and treatment of various thyroid pathologies because it is the molecular basis for radioiodide thyroid-imaging techniques and for specific targeting of radioisotopes to the thyroid gland (Levy, O. et al. (1997) Proc. Natl. Acad. Sci.
  • SMVT is expressed in the intestinal mucosa, kidney, and placenta, and is impHcated in the transport of the water-soluble vitamins, e.g., biotin and pantothenate (Prasad, P.D. et al. (1998) J. Biol. Chem. 273:7501-7506).
  • Monocarboxylate anion transporters are proton-coupled symporters with a broad substrate specificity that includes L-lactate, pyruvate, and the ketone bodies acetate, acetoacetate, and beta-hydroxybutyrate. At least seven isoforms have been identified to date.
  • the isoforms are predicted to have twelve transmembrane (TM) heHcal domains with a large intraceUular loop between TM6 and TM7, and play a critical role in mamtaining intraceUular pH by removing the protons that are produced stoichiometricaUy with lactate during glycolysis.
  • TM transmembrane
  • H(+)-monocarboxylate transporter is that of the erythrocyte membrane, which transports L-lactate and a wide range of other aHphatic monocarboxylates.
  • Other ceUs possess H(+)-linked monocarboxylate transporters with differing substrate and inhibitor selectivities.
  • cardiac muscle and tumor ceUs have transporters that differ in their K ⁇ values for certain substrates, including stereoselectivity for L- over D-lactate, and in their sensitivity to inhibitors.
  • Na(+)-monocarboxylate cotransporters on the luminal surface of intestinal and kidney epitheHa, which aUow the uptake of lactate, pyruvate, and ketone bodies in these tissues.
  • organic anion transporters are selective for hydrophobic, charged molecules with electron-attracting side groups.
  • Organic cation transporters such as the ammonium transporter, mediate the secretion of a variety of drugs and endogenous metaboHtes, and contribute to the maintenance of interceUular pH.
  • ABC transporters can transport substances that differ markedly in chemical structure and size, ranging from smaU molecules such as ions, sugars, amino acids, peptides, and phosphoHpids, to Hpopeptides, large proteins, and complex hydrophobic drugs.
  • ABC proteins consist of four modules: two nucleotide-binding domains (NBD), which hydrolyze ATP to supply the energy required for transport, and two membrane-spanning domains (MSD), each containing six putative transmembrane segments. These four modules may be encoded by a single gene, as is the case for the cystic fibrosis transmembrane regulator (CFTR), or by separate genes.
  • NBD nucleotide-binding domains
  • MSD membrane-spanning domains
  • each gene product contains a single NBD and MSD. These 'half-molecules" form homo- and heterodimers, such as Tapl and Tap2, the endoplasmic reticulum-based major histocompatibiHty (MHC) peptide transport system.
  • MHC major histocompatibiHty
  • MDR multidrug resistance
  • Fatty acid transport protein an integral membrane protein with four transmembrane segments, is expressed in tissues exhibiting high levels of plasma membrane fatty acid flux, such as muscle, heart, and adipose. Expression of FATP is upregulated in 3T3-L1 ceUs during adipose conversion, and expression in COS7 fibroblasts elevates uptake of long-chain fatty acids (Hui, T.Y. et al. (1998) J. Biol. Chem. 273:27420-27429). Ion Channels
  • the electrical potential of a ceU is generated and maintained by controlling the movement of ions across the plasma membrane.
  • the movement of ions requires ion channels, which form an ion- selective pore within the membrane.
  • ion channels There are two basic types of ion channels, ion transporters and gated ion channels.
  • Ion transporters utiHze the energy obtained from ATP hydrolysis to actively transport an ion against the ion's concentration gradient.
  • Gated ion channels aUow passive flow of an ion down the ion's electrochemical gradient under restricted conditions.
  • Ion transporters generate and maintain the resting electrical potential of a ceU. Utilizing the energy derived from ATP hydrolysis, they transport ions against the ion's concentration gradient. These transmembrane ATPases are divided into three famiHes.
  • the phosphorylated (P) class ion transporters including Na + -K + ATPase, Ca 2+ -ATPase, and H + - ATPase, are activated by a phosphorylation event.
  • P-class ion transporters are responsible for maintaining resting potential distributions such that cytosoHc concentrations of Na + and Ca 2+ are low and cytosoHc concentration of K + is high.
  • the vacuolar (V) class of ion transporters includes H + pumps on intraceUular organeUes, such as lysosomes and Golgi. V-class ion transporters are responsible for generating the low pH within the lumen of these organeUes that is required for function.
  • the coupling factor (F) class consists of H + pumps in the mitochondria.
  • F-class ion transporters utiHze a proton gradient to generate ATP from ADP and inorganic phosphate (P j ).
  • the resting potential of the ceU is utiHzed in many processes involving carrier proteins and gated ion channels.
  • Carrier proteins utiHze the resting potential to transport molecules into and out of the ceH.
  • Amino acid and glucose transport into many ceHs is linked to sodium ion co-transport (symport) so that the movement of Na + down an electrochemical gradient drives transport of the other molecule up a concentration gradient.
  • cardiac muscle Hnks transfer of Ca 2+ out of the ceU with transport of Na + into the ceU (antiport).
  • Ion channels share common structural and mechanistic themes.
  • the channel consists of four or five subunits or protein monomers that are arranged like a barrel in the plasma membrane.
  • Each subunit typicaUy consists of six potential transmembrane segments (SI, S2, S3, S4, S5, and S6).
  • the center of the barrel forms a pore lined by ⁇ -heHces or ⁇ -strands.
  • the side chains of the amino acid residues comprising the ⁇ -heHces or ⁇ -strands estabHsh the charge (cation or anion) selectivity of the channel.
  • the degree of selectivity, or what specific ions are aUowed to pass through the channel depends on the diameter of the narrowest part of the pore.
  • Gated ion channels control ion flow by regulating the opening and closing of pores. These channels are categorized according to the manner of regulating the gating function. MechanicaUy- gated channels open pores in response to mechanical stress, voltage-gated channels open pores in response to changes in membrane potential, and Hgand-gated channels open pores in the presence of a specific ion, nucleotide, or neurotransmitter.
  • Voltage-gated Na + and K + channels are necessary for the function of electricaUy excitable ceHs, such as nerve and muscle ceUs.
  • Action potentials which lead to neurotransmitter release and muscle contraction, arise from large, transient changes in the permeabiHty of the membrane to Na + and K + ions.
  • Depolarization of the membrane beyond the threshold level opens voltage-gated Na + channels.
  • Sodium ions flow into the ceU, further depolarizing the membrane and opening more voltage-gated Na + channels, which propagates the depolarization down the length of the ceU.
  • Depolarization also opens voltage-gated potassium channels. Consequently, potassium ions flow outward, which ieads to repolarization of the membrane.
  • Voltage-gated channels utiHze charged residues in the fourth transmembrane segment (S4) to sense voltage change.
  • the open state lasts only about 1 millisecond, at which time the channel spontaneously converts into an inactive state that cannot be opened irrespective of the membrane potential.
  • Inactivation is mediated by the channel's N-terminus, which acts as a plug that closes the pore. The transition from an inactive to a closed state requires a return to resting potential.
  • Voltage-gated Na + channels are heterotrimeric complexes composed of a 260 kDa pore forming ⁇ subunit that associates with two smaUer auxiliary subunits, ⁇ l and ⁇ 2.
  • the ⁇ 2 subunit is an integral membrane glycoprotein that contains an extraceUular Ig domain, and its association with ⁇ and ⁇ l subunits correlates with increased functional expression of the channel, a change in its gating properties, and an increase in whole ceU capacitance due to an increase in membrane surface area.
  • Voltage-gated Ca 2+ channels are involved in presynaptic neurotransmitter release, and heart and skeletal muscle contraction.
  • the voltage-gated Ca 2+ channels from skeletal muscle (L-type) and brain (N-type) have been purified, and though their functions differ dramaticaUy, they have similar subunit compositions.
  • the channels are composed of three subunits.
  • the ⁇ x subunit forms the membrane pore and voltage sensor, while the o ⁇ and ⁇ subunits modulate the voltage-dependence, gating properties, and the current ampHtude of the channel.
  • These subunits are encoded by at least six ⁇ l5 one ⁇ , and four ⁇ genes.
  • a fourth subunit, ⁇ has been identified in skeletal muscle. (Walker, D.
  • Chloride channels are necessary in endocrine secretion and in regulation of cytosoHc and organeUe pH.
  • CI enters the ceU across a basolateral membrane through an Na + , K + /C1 * cotransporter, accumulating in the ceU above its electrochemical equiHbrium concentration.
  • the cystic fibrosis transmembrane conductance regulator is a chloride channel encoded by the gene for cystic fibrosis, a common fatal genetic disorder in humans. Loss of CFTR function decreases transepitheUal water secretion and, as a result, the layers of mucus that coat the respiratory tree, pancreatic ducts, and intestine are dehydrated and difficult to clear. The resulting blockage of these sites leads to pancreatic insufficiency, "meconium ileus", and devastating "chronic obstructive pulmonary disease” (Al-Awqati, Q. et al. (1992) J. Exp. Biol. 172:245-266).
  • H + - ATPase pumps that generate transmembrane pH and electrochemical differences by moving protons from the cytosol to the organeUe lumen. If the membrane of the organeUe is permeable to other ions, then the electrochemical gradient can be abrogated without affecting the pH differential. In fact, removal of the electrochemical barrier aUows more H + to be pumped across the membrane, increasing the pH differential.
  • CI " is the sole counterion of H + translocation in a number of organeUes, including chromaffin granules, Golgi vesicles, lysosomes, and endosomes.
  • Functions that require a low vacuolar pH include uptake of smaU molecules such as biogenic amines in chromaffin granules, processing of vacuolar constituents such as pro-hormones by proteolytic enzymes, and protein degradation in lysosomes (Al-Awqati, supra).
  • Ligand-gated channels open their pores when an extraceUular or intraceUular mediator binds to the channel.
  • Neurotransmitter-gated channels are channels that open when a neurotransmitter binds to their extraceUular domain. These channels exist in the postsynaptic membrane of nerve or muscle ceUs.
  • Chloride channels open in response to inhibitory neurotransmitters, such as ⁇ -aminobutyric acid (GABA) and glycine, leading to hyperpolarization of the membrane and the subsequent generation of an action potential.
  • Ligand-gated channels can be regulated by intraceUular second messengers.
  • Calcium- activated K + channels are gated by internal calcium ions. In nerve ceUs, an influx of calcium during depolarization opens K + channels to modulate the magnitude of the action potential (Ishi, T.M. et al.
  • CycHc nucleotide-gated (CNG) channels are gated by cytosoHc cycHc nucleotides.
  • CNG CycHc nucleotide-gated
  • Ion channels are expressed in a number of tissues where they are impHcated in a variety of processes. CNG channels, while abundantly expressed in photoreceptor and olfactory sensory ceUs, 0 are also found in kidney, lung, pineal, retinal gangHon ceUs, testis, aorta, and brain. Calcium-activated
  • K + channels maybe responsible for the vasodilatory effects of bradykinin in the kidney and for shunting excess K + from brain capillary endotheHal ceUs into the blood. They are also impHcated in repolarizing granulocytes after agonist-stimulated depolarization (Ishi, supra). Ion channels have been the target for many drug therapies. Neurotransmitter-gated channels have been targeted in therapies 5 for treatment of insomnia, anxiety, depression, and schizophrenia. Voltage-gated channels have been targeted in therapies for arrhythmia, ischemic stroke, head trauma, and neurodegenerative disease
  • the ceUular processes regulating modification and maintenance of protein molecules o coordinate their conformation, stabiHzation, and degradation. Each of these processes is mediated by key enzymes or proteins such as proteases, protease inhibitors, transferases, isomerases, and molecular chaperones.
  • Proteases cleave proteins and peptides at the peptide bond that forms the backbone of the 5 peptide and protein chain.
  • Proteolytic processing is essential to ceU growth, differentiation, remo eing, an omeos as s as we as in amma on an immune response.
  • ypica pro em a - ves range from hours to a few days, so that within aH Hving ceHs, precursor proteins are being cleaved to their active form, signal sequences proteolyticaHy removed from targeted proteins, and aged or defective proteins degraded by proteolysis.
  • Proteases function in bacterial, parasitic, and viral invasion and repHcation within a host.
  • SPs serine proteases
  • SPs include the digestive enzymes trypsin and chymotrypsin, components of the complement cascade and the blood-clotting cascade, and enzymes that control extraceUular protein degradation.
  • the main SP sub-famiHes are trypases, which cleave after arginine or lysine; aspartases, which cleave after aspartate; chymases, which cleave after phenylalanine or leucine; metases, which cleavage after methionine; and serases wliich cleave after serine.
  • Enterokinase the initiator of intestinal digestion, is a serine protease found in the intestinal brush border, where it cleaves the acidic propeptide from trypsinogen to yield active trypsin (Kitamoto, Y. et al. (1994) Proc. Natl. Acad. Sci. USA 91:7588-7592).
  • Prolylcarboxypeptidase a lysosomal serine peptidase that cleaves peptides such as angiotensin H and HI and [des-Arg9] bradykinin, shares sequence homology with members of both the serine carboxypeptidase and prolylendopeptidase famiHes (Tan, F. et al. (1993) J. Biol. Chem. 268:16631- 16638).
  • Cysteine proteases have a cysteine as the major catalytic residue at an active site where catalysis proceeds via an intermediate thiol ester and is facilitated by adjacent histidine and aspartic acid residues.
  • CPs are involved in diverse ceUular processes ranging from the processing of precursor proteins to intraceUular degradation. MammaHan CPs include lysosomal cathepsins and cytosoHc calcium activated proteases, calpains.
  • CPs are produced by monocytes, macrophages and other ceUs of the immune system which migrate to sites of inflammation and secrete molecules involved in tissue repair. Overabundance of these repair molecules plays a role in certain disorders.
  • cysteine peptidase cathepsin C In autoimmune diseases such as rheumatoid arthritis, secretion of the cysteine peptidase cathepsin C degrades coUagen, laminin, elastin and other structural proteins found in the extraceUular matrix of bones.
  • Aspartic proteases are members of the cathepsin family of lysosomal proteases and include pepsin A, gastricsin, chymosin, renin, and cathepsins D and E. Aspartic proteases have a pair of aspartic acid residues in the active site, and are most active in the pH 2 - 3 range, in which one of the aspartate residues is ionized, the other un-ionized. Aspartic proteases include bacterial peniciUopepsin, mammaHan pepsin, renin, chymosin, and certain fungal proteases. Abnormal regulation and expression of cathepsins is evident in various inflammatory disease states.
  • ceUs isolated from inflamed synovia the mRNA for stromelysin, cytokines, TTMP-1, cathepsin, gelatinase, and other molecules is preferentiaUy expressed.
  • Expression of cathepsins L and D is elevated in synovial tissues from patients with rheumatoid arthritis and osteoarthritis.
  • Cathepsin L expression may also contribute to the influx of mononuclear ceHs which exacerbates the destruction of the rheumatoid synovium. (Keyszer, G.M. (1995) Arthritis Rheum.
  • MetaUoproteases have active sites that include two glutamic acid residues and one histidine residue that serve as binding sites for zinc.
  • Carboxypeptidases A and B are the principal mammaHan metaUoproteases. Both are exoproteases of similar structure and active sites.
  • Carboxypeptidase A like chymotrypsin, prefers C-terminal aromatic and aHphatic side chains of hydrophobic nature, whereas carboxypeptidase B is directed toward basic arginine and lysine residues.
  • Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of ceUular proteins in eukaryotic ceUs and some bacteria.
  • UCS ubiquitin conjugation system
  • proteins targeted for degradation are conjugated to a ubiquitin, a smaHheat stable protein.
  • the ubiquitinated protein is then recognized and degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutiUzation by ubiquitin protease.
  • the UCS is impHcated in the degradation of mitotic cycHc kinases, oncoproteins, tumor suppressor genes such as p53, viral proteins, ceH surface receptors associated with signal transduction, transcriptional regulators, and mutated or damaged proteins (Ciechanover, A. (1994) CeU 79:13-21).
  • a murine proto-oncogene, Unp encodes a nuclear ubiquitin protease whose overexpression leads to oncogenic transformation of NTH3T3 ceUs, and the human homolog of this gene is consistently elevated in smaU ceU tumors and adenocarcinomas of the lung (Gray, D.A. (1995) Oncogene 10:2179- 2183).
  • Signal Peptidases The mechanism for the translocation process into the endoplasmic reticulum (ER) involves the recognition of an N-terminal signal peptide on the elongating protein. The signal peptide directs the protein and attached ribosome to a receptor on the ER membrane.
  • the polypeptide chain passes through a pore in the ER membrane into the lumen while the N-terminal signal peptide remains attached at the membrane surface. The process is completed when signal peptidase located inside the ER cleaves the signal peptide from the protein and releases the protein into the lumen.
  • Protease inhibitors and other regulators of protease activity control the activity and effects of proteases.
  • Protease inhibitors have been shown to control pathogenesis in animal models of proteolytic disorders (Murphy, G. (1991) Agents Actions Suppl. 35:69-76).
  • Serpins are inhibitors of mammaHan plasma serine proteases. Many serpins serve to regulate the blood clotting cascade and/or the complement cascade in mammals.
  • Sp32 is a positive regulator of the mammaHan acrosomal protease, acrosin, that binds the proenzyme, proacrosin, and thereby aides in packaging the enzyme into the acrosomal matrix (Baba, T. et al. (1994) J. Biol. Chem. 269:10133-10140).
  • the Kunitz family of serine protease inhibitors are characterized by one or more "Kunitz domains" containing a series of cysteine residues that are regularly spaced over approximately 50 amino acid residues and form three intrachain disulfide bonds.
  • TFPI-1 and TFPI-2 tissue factor pathway inhibitor
  • bikunin inter- -trypsin inhibitor
  • aprotinin tissue factor pathway inhibitor
  • TFPI-1 and TFPI-2 tissue factor pathway inhibitor
  • inter- -trypsin inhibitor inter- -trypsin inhibitor
  • bikunin bikunin.
  • aU proteins synthesized in eukaryotic ceUs are synthesized on the cytosoHc surface of the endoplasmic reticulum (ER). Before these immature proteins are distributed to other organeUes in the ceU or are secreted, they must be transported into the interior lumen of the ER where post-translational modifications are performed. These modifications include protein folding and the formation of disulfide bonds, and N-Hnked glycosylations. Protein Isomerases
  • Protein folding in the ER is aided by two principal types of protein isomerases, protein disulfide isomerase (PDI), and peptidyl-prolyl isomerase (PPI).
  • PDI protein disulfide isomerase
  • PPI peptidyl-prolyl isomerase
  • PDI catalyzes the oxidation of free sulfhydryl groups in cysteine residues to form intramolecular disulfide bonds in proteins.
  • PPI an enzyme that catalyzes the isomerization of certain proline imidic bonds in oHgopeptides and proteins, is considered to govern one of the rate limiting steps in the folding of many proteins to their final functional conformation.
  • the cyclophiHns represent a major class of PPI that was originaUy identified as the major recep or or t e immunosuppressive rug cyc ospor n an sc umac er, . . e a .
  • An additional glycosylation mechanism operates in the ER specificaUy to target lysosomal enzymes to lysosomes and prevent their secretion.
  • Lysosomal enzymes in the ER receive an N-linked oHgosaccharide, like plasma membrane and secreted proteins, but are then phosphorylated on one or o two mannose residues.
  • the phosphorylation of mannose residues occurs in two steps, the first step being the addition of an N-acetylglucosamine phosphate residue by N-acetylglucosamine phosphotransferase, and the second the removal of the N-acetylglucosamine group by phosphodiesterase.
  • the phosphorylated mannose residue then targets the lysosomal enzyme to a mannose 6-phosphate receptor which transports it to a lysosome vesicle (Lodish, supra, pp. 708-7 il). 5 Chaperones
  • Chaperones are proteins that aid in the proper folding of immature proteins and refolding of improperly folded ones, the assembly of protein subunits, and in the transport of unfolded proteins across membranes. Chaperones are also caUed heat-shock proteins (hsp) because of their tendency to be expressed in dramaticaUy increased amounts foUowing brief exposure of ceHs to o elevated temperatures. This latter property most likely reflects their need in the refolding of proteins that have become denatured by the high temperatures. Chaperones may be divided into several classes according to their location, function, and molecular weight, and include hsp60, TCP1, hsp70, hsp40 (also caUed Dnaj), and hsp90.
  • Hsp90 binds to steroid hormone receptors, represses transcription in the absence of the Hgand, and provides proper folding of the Hgand-binding 5 domain of the receptor in the presence of the hormone (Burston, S.G. and A.R. Clarke (1995) Essays Biochem. 29:125-136).
  • Hsp60 andhsp70 chaperones aid in the transport and folding of newly synthesized proteins.
  • Hsp70 acts early in protein folding, binding a newly synthesized protein before it leaves the ribosome and transporting the protein to the mitochondria or ER before releasing the folded protein.
  • Hsp60 along with hsplO, binds misfolded proteins and gives them the opportunity to refold 5 correctly.
  • AU chaperones share an affinity for hydrophobic patches on incompletely folded proteins and the abiHty to hydrolyze ATP.
  • the energy of ATP hydrolysis is used to release the hsp-bound protein in its properly folded state (Alberts, supra, pp 214, 571-572).
  • DNA and RNA repHcation are critical processes for ceU repHcation and function.
  • DNA and RNA repHcation are mediated by the enzymes DNA and RNA polymerase, respectively, by a "templating" process in which the nucleotide sequence of a DNA or RNA strand is copied by complementary base-pairing into a complementary nucleic acid sequence of either DNA or RNA. 5
  • templating the process in which the nucleotide sequence of a DNA or RNA strand is copied by complementary base-pairing into a complementary nucleic acid sequence of either DNA or RNA.
  • DNA polymerase catalyzes the stepwise addition of a deoxyribonucleotide to the 3'-OH end of a polynucleotide strand (the primer strand) that is paired to a second (template) strand.
  • the new DNA strand therefore grows in the 5' to 3' direction (Alberts, B. et al. (1994) The Molecular Biology of the CeU, Garland PubHshing Inc., New York NY, pp. 251-254).
  • the substrates for the 0 polymerization reaction are the corresponding deoxynucleotide triphosphates which must base-pair with the correct nucleotide on the template strand in order to be recognized by the polymerase.
  • DNA exists as a double-stranded heHx
  • each of the two strands may serve as a template for the formation of a new complementary strand.
  • Each of the two daughter ceUs of the dividing ceU therefore inherits a new DNA double heHx containing one old and one new strand.
  • DNA is said 5 to be repHcated "semiconservatively" by DNA polymerase.
  • DNA polymerase is also involved in the repair of damaged DNA as discussed below under "Ligases.”
  • RNA polymerase uses a DNA template strand to "transcribe" DNA into RNA using ribonucleotide triphosphates as substrates. Like DNA polymerization, RNA polymerization proceeds in a 5' to 3' direction by addition of a ribonucleoside o monophosphate to the 3 '-OH end of a growing RNA chain. DNA transcription generates messenger RNAs (mRNA) that carry information for protein synthesis, as weU as the transfer, ribosomal, and other RNAs that have structural or catalytic functions. In eukaryotes, three discrete RNA polymerases synthesize the three different types of RNA (Alberts, supra, pp. 367-368).
  • mRNA messenger RNAs
  • RNA polymerase I makes the large ribosomal RNAs
  • RNA polymerase H makes the mRNAs that wiU be 5 translated into proteins
  • RNA polymerase HI makes a variety of smaU, stable RNAs, including 5S ribosomal RNA and the transfer RNAs (tRNA).
  • RNA synthesis is initiated by binding of the RNA polymerase to a promoter region on the DNA and synthesis begins at a start site within the promoter. Synthesis is completed at a broad, general stop or termination region in the DNA where both the polymerase and the completed RNA chain are released.
  • DNA repair is the process by which accidental base changes, such as those produced by oxidative damage, hydrolytic attack, or uncontroHed methylation of DNA are corrected before repHcation or transcription of the DNA can occur. Because of the efficiency of the DNA repair process, fewer than one in one thousand accidental base changes causes a mutation (Alberts, supra, pp. 245-249).
  • the three steps common to most types of DNA repair are (1) excision of the damaged or altered base or nucleotide by DNA nucleases, leaving a gap; (2) insertion of the correct nucleotide in this gap by DNA polymerase using the complementary strand as the template; and (3) sealing the break left between the inserted nucleotide(s) and the existing DNA strand by DNA Hgase.
  • DNA Hgase uses the energy from ATP hydrolysis to activate the 5' end of the broken phosphodiester bond before forming the new bond with the 3'-OH of the DNA strand.
  • Bloom's syndrome an inherited human disease, individuals are partiaUy deficient in DNA Hgation and consequently have an increased incidence of cancer (Alberts, supra, p. 247). Nucleases
  • Nucleases comprise both enzymes that hydrolyze DNA (DNase) and RNA (RNase). They serve different purposes in nucleic acid metaboHsm. Nucleases hydrolyze the phosphodiester bonds between adjacent nucleotides either at internal positions (endonucleases) or at the terminal 3 ' or 5' nucleotide positions (exonucleases).
  • a DNA exonuclease activity in DNA polymerase serves to remove improperly paired nucleotides attached to the 3'-OH end of the growing DNA strand by the polymerase and thereby serves a "proofreading" function. As mentioned above, DNA endonuclease activity is involved in the excision step of the DNA repair process.
  • RNases also serve a variety of functions.
  • RNase P is a ribonucleoprotein enzyme which cleaves the 5' end of pre-tRNAs as part of their maturation process.
  • RNase H digests the RNA strand of an RNA/DNA hybrid. Such hybrids occur in ceUs invaded by retroviruses, and RNase H is an important enzyme in the retroviral repHcation cycle.
  • Pancreatic RNase secreted by the pancreas into the intestine hydrolyzes RNA present in ingested foods.
  • RNase activity in serum and ceH extracts is elevated in a variety of cancers and infectious diseases (Schein, CH. (1997) Nat. Biotechnol. 15:529-536).
  • Methylases Methylation of specific nucleotides occurs in both DNA and RNA, and serves different functions in the two macromolecules. Methylation of cytosine residues to form 5-methyl cytosine in
  • DNA occurs specificaUy at CG sequences which are base-paired with one another in the DNA double-heHx.
  • This pattern of methylation is passed from generation to generation during DNA repHcation by an enzyme caUed "maintenance methylase" that acts preferentially on those CG sequences that are base-paired with a CG sequence that is akeady methylated.
  • Such methylation appears to distinguish active from inactive genes by preventing the binding of regulatory proteins that "turn on” the gene, but permit the binding of proteins that inactivate the gene (Alberts, supra, pp. 448- 451).
  • tRNA methylase produces one of several nucleotide modifications in tRNA that affect the conformation and base-pairing of the molecule and faciHtate the recognition of the appropriate mRNA codons by specific tRNAs.
  • the primary methylation pattern is the dimethylation of guanine residues to form N,N-dimethyl guanine.
  • HeHcases are enzymes that destabilize and unwind double heHx structures in both DNA and RNA. Since DNA repHcation occurs more or less simultaneously on both strands, the two strands must first separate to generate a repHcation "fork" for DNA polymerase to act on. Two types of repHcation proteins contribute to this process, DNA heHcases and single-stranded binding proteins. DNA heHcases hydrolyze ATP and use the energy of hydrolysis to separate the DNA strands. Single-stranded binding proteins (SSBs) then bind to the exposed DNA strands without covering the bases, thereby temporarily stabilizing them for templating by the DNA polymerase (Alberts, supra, pp. 255-256).
  • SSBs Single-stranded binding proteins
  • RNA heHcases also alter and regulate RNA conformation and secondary structure. Like the DNA heHcases, RNA heHcases utiHze energy derived from ATP hydrolysis to destabiHze and unwind RNA duplexes.
  • the most well-characterized and ubiquitous family of RNA heHcases is the DEAD- box family, so named for the conserved B-type ATP-binding motif which is diagnostic of proteins in this family.
  • DEAD-box heHcases Over 40 DEAD-box heHcases have been identified in organisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants. DEAD-box heHcases function in diverse processes such as translation initiation, spHcing, ribosome assembly, and RNA editing, transport, and stability.
  • DEAD-box heHcases play tissue- and stage-specific roles in spermatogenesis and embryogenesis.
  • Overexpression of the DEAD-box 1 protein (DDX1) may play a role in the progression of neuroblastoma (Nb) and retinoblastoma (Rb) tumors (Godbout, R. et al. (1998) J. Biol. Chem. 273:21161-21168).
  • Nb neuroblastoma
  • Rb retinoblastoma
  • DDX1 may promote or enhance tumor progression by altering the normal secondary structure and expression levels of RNA in cancer ceUs.
  • Other DEAD-box heHcases have been impHcated either directly or indirectly in tumorigenesis (Discussed in Godbout, supra).
  • murine p68 is mutated in ultraviolet Hght-induced tumors
  • human DDX6 is located at a chromosomal breakpoint associated with B-ceU lymphoma.
  • a chimeric protein comprised of DDX10 and NUP98 , a nucleoporin protein, may be involved in the pathogenesis of certain myeloid maHgnancies. Topoisomerases
  • DNA topoisomerase effectively acts as a reversible nuclease that hydrolyzes a phosphodiesterase bond in a DNA strand, permitting the two strands to rotate freely about one another to remove the strain of the heHx, and then rejoins the original phosphodiester bond between the two strands.
  • DNA Topoisomerase I causes a single-strand break in a DNA heHx to aUow the rotation of the two strands of the heHx about the remaining phosphodiester bond in the opposite strand.
  • DNA topoisomerase H causes a transient break in both strands of a DNA heHx where two double heHces cross over one another. This type of topoisomerase can efficiently separate two interlocked DNA circles (Alberts, supra, pp.260-262).
  • Type H topoisomerases are largely confined to proHferating ceHs 5 in eukaryotes, such as cancer ceUs. For this reason they are targets for anticancer drugs.
  • Topoisomerase H has been impHcated in multi-drug resistance (MDR) as it appears to aid in the repair of DNA damage inflicted by DNA binding agents such as doxorubicin and vincristine.
  • MDR multi-drug resistance
  • Genetic recombination is the process of rearranging DNA sequences within an organism's o genome to provide genetic variation for the organism in response to changes in the environment.
  • DNA recombination aUows variation in the particular combination of genes present in an individual's genome, as weU as the timing and level of expression of these genes (see Alberts, supra, pp. 263-273).
  • Two broad classes of genetic recombination are commonly recognized, general recombination and site-specific recombination.
  • General recombination involves genetic exchange between any 5 homologous pair of DNA sequences usuaUy located on two copies of the same chromosome.
  • the process is aided by enzymes caUed recombinases that "nick" one strand of a DNA duplex more or less randomly and permit exchange with the complementary strand of another duplex.
  • the process does not normaUy change the arrangement of genes on a chromosome.
  • the recombinase recognizes specific nucleotide sequences present in one or both of the recombining o molecules. Base-pairing is not involved in this form of recombination and therefore does not require
  • RNA processing steps include capping at the 5' end with methylguanosine, polyadenylating the 3' end, and spHcing to remove introns.
  • the primary RNA transcript from DNA is a faithful copy of the gene containing both exon and intron sequences, and the latter sequences must be cut out of the RNA transcript to produce an mRNA that codes for a protein.
  • This "spHcing" of the mRNA sequence takes place in the nucleus with the aid of a large, multicomponent ribonucleoprotein complex known as a spHceosome.
  • the spHceosomal complex is composed of five smaU nuclear ribonucleoprotein particles (snRNPs) designated UI, U2, U4, U5, and U6, and a number of additional proteins.
  • snRNP nuclear ribonucleoprotein particles
  • Each snRNP contains a single species of snRNA and about ten proteins.
  • the RNA components of some snRNPs recognize and base pair with intron consensus sequences.
  • the protein components mediate spHceosome assembly and the spHcing reaction.
  • Autoantibodies to snRNP proteins are found in die blood of patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY, p. 863).
  • ceU The surface of a ceU is rich in transmembrane proteoglycans, glycoproteins, glycoHpids, and receptors. These macromolecules mediate adhesion with other ceUs and with components of the extraceUular matrix (ECM).
  • ECM extraceUular matrix
  • Cadherins comprise a family of calcium-dependent glycoproteins that function in mediating ceU-ceU adhesion in virtuaUy aU soHd tissues of multiceUular organisms. These proteins share multiple repeats of a cadherin-specific motif, and the repeats form the folding units of the cadherin extraceUular domain. Cadherin molecules cooperate to form focal contacts, or adhesion plaques, between adjacent epitheHal ceUs.
  • the cadherin family includes the classical cadherins and protocadherins.
  • Classical cadherins include the E-cadherin, N-cadherin, and P-cadherin subfamilies.
  • E-cadherin is present on many types of epitheHal ceUs and is especiaUy important for embryonic development.
  • N-cadherin is present on nerve, muscle, and lens ceUs and is also critical for embryonic development.
  • P-cadherin is present on ceUs of the placenta and epidermis. Recent studies report that protocadherins are involved in a variety of ceU-ceU interactions (Suzuki, S.T. (1996) J. CeU Sci.
  • cadherins The intraceUular anchorage of cadherins is regulated by their dynamic association with catenins, a family of cytoplasmic signal transduction proteins associated with the actin cytoskeleton.
  • the anchorage of cadherins to the actin cytoskeleton appears to be regulated by protein tyrosine phosphorylation, and the cadherins are the target of phosphorylation-induced junctional disassembly (Aberle, H. et al. (1996) J. CeU. Biochem. 61:514-523). lntegrins
  • Integrins are ubiquitous transmembrane adhesion molecules that link the ECM to the internal cytoskeleton. Integrins are composed of two noncovalently associated transmembrane glycoprotein subunits called ⁇ and ⁇ . Integrins function as receptors that play a role in signal transduction. For example, binding of integrin to its extraceUular Hgand may stimulate changes in intraceUular calcium levels or protein kinase activity (Sjaastad, M.D. and W.J. Nelson (1997) BioEssays 19:47-55). At least ten ceU surface receptors of the integrin family recognize the ECM component fibronectin, which is involved in many different biological processes including ceU migration and embryogenesis (Johansson, S. et al. (1997) Front. Biosci. 2:D126-D146). Lectins
  • Lectins comprise a ubiquitous family of extraceUular glycoproteins which bind ceU surface carbohydrates specificaUy and reversibly, resulting in the agglutination of ceUs (reviewed in Drickamer, K. and M.E. Taylor (1993) Annu. Rev. CeU Biol. 9:237-264). This function is particularly important for activation of the immune response. Lectins mediate the agglutination and mitogenic stimulation of lymphocytes at sites of inflammation (Lasky, L.A. (1991) J. CeU. Biochem. 45:139-146; Paietta, E. et al. (1989) J. Immunol. 143:2850-2857).
  • Lectins are further classified into subfamilies based on carbohydrate-binding specificity and other criteria.
  • the galectin subfamily includes lectins that bind ⁇ -galactoside carbohydrate moieties in a thiol-dependent manner (reviewed in Hadari, Y.R. et al. (1998) J. Biol. Chem. 270:3447-3453).
  • Galectins are widely expressed and developmentaUy regulated. Because aU galectins lack an N-terminal signal peptide, it is suggested that galectins are externaHzed through an atypical secretory mechanism.
  • Two classes of galectins have been defined based on molecular weight and oHgomerization properties.
  • Galectins form homodimers and are about 14 to 16 kilodaltons in mass, while large galectins are monomeric and about 29-37 kilodaltons.
  • Galectins contain a characteristic carbohydrate recognition domain (CRD).
  • the CRD is about 140 amino acids and contains several stretches of about 1 - 10 amino acids which are highly conserved among aU galectins.
  • a particular 6-amino acid motif within the CRD contains conserved tryptophan and arginine residues which are critical for carbohydrate binding.
  • the CRD of some galectins also contains cysteine residues which maybe important for disulfide bond formation. Secondary structure predictions indicate that the CRD forms several ⁇ -sheets.
  • Galectins play a number of roles in diseases and conditions associated with ceU-ceU and ceU- matrix interactions. For example, certain galectins associate with sites of inflammation and bind to ceU surface immunoglobulin E molecules. In addition, galectins may play an important role in cancer metastasis. Galectin overexpression is correlated with the metastatic potential of cancers in humans and mice. Moreover, anti-galectin antibodies inhibit processes associated with ceU transformation, suc as ce aggregat on an anc orage- n ependent growt ( ee, or examp e, Su, .-Z. et a .
  • Selectins comprise a speciaHzed lectin subfamily involved primarily in 5 inflammation and leukocyte adhesion (Reviewed in Lasky, supra). Selectins mediate the recruitment of leukocytes from the circulation to sites of acute inflammation and are expressed on the surface of vascular endotheHal ceUs in response to cytokine signaling. Selectins bind to specific Hgands on the leukocyte ceH membrane and enable the leukocyte to adhere to and migrate along the endotheHal surface. Binding of selectin to its Hgand leads to polarized rearrangement of the actin cytoskeleton 0 and stimulates signal transduction within the leukocyte (Brenner, B. et al. (1997) Biochem.
  • the selectins include lymphocyte 5 adhesion molecule-1 (Lam-1 or L-selectin), endotheHal leukocyte adhesion molecule-1 (ELAM-1 or E- selectin), and granule membrane protein-140 (GMP-140 or P-selectin) (Johnston, G.I. et al. (1989) CeU 56:1033-1044).
  • Antigen Recognition Molecules o AU vertebrates have developed sophisticated and complex immune systems that provide protection from viral, bacterial, fungal, and parasitic infections.
  • a key feature of the immune system is its abiHty to distinguish foreign molecules, or antigens, from "self' molecules.
  • This abiHty is mediated primarily by secreted and transmembrane proteins expressed by leukocytes (white blood ceHs) such as lymphocytes, granulocytes, and monocytes. Most of these proteins belong to the immunoglobuHn (Ig) 5 superfamily, members of which contain one or more repeats of a conserved structural domain. This
  • Ig domain is comprised of antiparaUel ⁇ sheets joined by a disulfide bond in an arrangement caUed the Ig fold.
  • Ig superfamily include T-ceU receptors, major histocompatibiHty (MHC) proteins, antibodies, and immune ceU-specific surface markers such as CD4, CD8, and CD28.
  • MHC proteins are ceU surface markers that bind to and present foreign antigens to T ceUs. o MHC molecules are classified as either class I or class H. Class I MHC molecules (MHC I) are expressed on the surface of almost aU ceUs and are involved in the presentation of antigen to cytotoxic T ceHs. For example, a ceU infected with virus wiH degrade intraceUular viral proteins and express the protein fragments bound to MHC I molecules on the ceU surface. The MHC I/antigen complex is recognized by cytotoxic T-ceUs which destroy the infected ceU and the virus within.
  • MHC I Class I MHC molecules
  • Class ⁇ MHC 5 molecules are expressed primarily on speciaHzed antigen-presenting ceUs of the immune system, such as B-ceUs and macrophages. These ceUs ingest foreign proteins from the extraceUular fluid and express MHC H/antigen complex on the ceU surface. This complex activates helper T-ceUs, which then secrete cytokines and other factors that stimulate the immune response. MHC molecules also play an important role in organ rejection foUowing transplantation. Rejection occurs when the 5 recipient's T-ceHs respond to foreign MHC molecules on the transplanted organ in the same way as to self MHC molecules bound to foreign antigen. (Reviewed in Alberts, B. et al. (1994) Molecular Biology of the CeU. Garland PubHshing, New York NY, pp. 1229-1246.)
  • Antibodies are either expressed on the surface of B-ceUs or secreted by B-ceUs into the circulation. Antibodies bind and neutraHze foreign antigens in the blood and other 0 extraceUular fluids.
  • the prototypical antibody is a tetramer consisting of two identical heavy polypeptide chains (H-chains) and two identical Hght polypeptide chains (L-chains) interlinked by disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules. Antibodies are classified based on their H-chain composition.
  • the five antibody classes, IgA, IgD, IgE, IgG and IgM, are defined by the ⁇ , ⁇ , ⁇ , ⁇ , and ⁇ H-chain types. There are two types of L-chains, 5 and ⁇ , either of which may associate as a pair with any H-chain pair. IgG, the most common class of antibody found in the circulation, is tetrameric* while the other classes of antibodies are generaUy variants or multimers of this basic structure.
  • H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant region.
  • the constant region consists of about 110 amino acids in L-chains and about 330 or 440 amino o acids in H-chains.
  • the amino acid sequence of the constant region is nearly identical among H- or L- chains of a particular class.
  • the variable region consists of about 110 amino acids in both H- and L- chains. However, the amino acid sequence of the variable region differs among H- or L-chains of a particular class.
  • Within each H- or L-chain variable region are three hypervariable regions of extensive sequence diversity, each consisting of about 5 to 10 amino acids. In the antibody molecule, 5 the H- and L-chain hypervariable regions come together to form the antigen recognition site.
  • Both H-chains and L-chains contain repeated Ig domains.
  • a typical H-chain contains four Ig domains, three of which occur within the constant region and one of which occurs within the variable region and contributes to the formation of the antigen recognition site.
  • a o typical L-chain contains two Ig domains, one of which occurs within the constant region and one of which occurs within the variable region.
  • the immune system is capable of recognizing and responding to any foreign molecule that enters the body. Therefore, the immune system must be armed with a fuU repertoire of antibodies against aU potential antigens.
  • Such antibody diversity is generated by somatic rearrangement of gene 5 segments encoding variable and constant regions. These gene segments are joined together by site- specific recombination which occurs between highly conserved DNA sequences that flank each gene segment. Because there are hundreds of different gene segments, millions of unique genes can be generated combinatoriaUy. In addition, imprecise joining of these segments and an unusuaUy high rate of somatic mutation within these segments further contribute to the generation of a diverse antibody 5 population.
  • T-ceU receptors are both structuraUy and functionaUy related to antibodies. (Reviewed in Alberts, supra, pp. 1228-1229.) T-ceU receptors are ceU surface proteins that bind foreign antigens and mediate diverse aspects of the immune response.
  • a typical T-ceU receptor is a heterodimer comprised of two disulfide-linked polypeptide chains caUed ⁇ and ⁇ . Each chain is about 280 amino 0 acids in length and contains one variable region and one constant region. Each variable or constant region folds into an Ig domain. The variable regions from the ⁇ and ⁇ chains come together in the heterodimer to form the antigen recognition site.
  • T-ceU receptor diversity is generated by somatic rearrangement of gene segments encoding die ⁇ and ⁇ chains.
  • T-cell receptors recognize smaU peptide antigens that are expressed on the surface of antigen-presenting ceUs and pathogen-infected 5 ceHs. These peptide antigens are presented on the ceU surface in association with major histocompatibiHty proteins wliich provide the proper context for antigen recognition.
  • Protein secretion is essential for ceUular function. Protein secretion is mediated by a signal o peptide located at the amino terminus of the protein to be secreted.
  • the signal peptide is comprised of about ten to twenty hydrophobic amino acids which target the nascent protein from the ribosome to the endoplasmic reticulum (ER). Proteins targeted to the ER may either proceed through the secretory pathway or remain in any of the secretory organeUes such as the ER, Golgi apparatus, or lysosomes. Proteins that transit through the secretory pathway are either secreted into the 5 extraceUular space or retained in the plasma membrane.
  • Secreted proteins are often synthesized as inactive precursors that are activated by post-translational processing events during transit through the secretory pathway. Such events include glycosylation, proteolysis, and removal of the signal peptide by a signal peptidase. Other events that may occur during protein transport include chaperone-dependent unfolding and folding of the nascent protein and interaction of the protein with a receptor or o pore complex. Examples of secreted proteins with amino terminal signal peptides include receptors, extraceUular matrix molecules, cytokines, hormones, growth and differentiation factors, neuropeptides, vasomediators, ion channels, transporters/pumps, and proteases. (Reviewed in Alberts, B. et al.
  • the extraceUular matrix is a complex network of glycoproteins, polysaccharides, 5 proteoglycans, and other macromolecules that are secreted from the ceU into the extraceUular space.
  • ECM extraceUular matrix
  • ine ⁇ ivi remains in close association wim me cen surrace ana provides a supportive mesnwor ⁇ mat profoundly influences ceU shape, motiHty, strength, flexibility, and adhesion.
  • adhesion of a ceU to its surrounding matrix is required for ceU survival except in the case of metastatic tumor ceUs, which have overcome the need for ceU-ECM anchorage.
  • ECM plays a critical role in the molecular mechanisms of growth control and metastasis.
  • Ruoslahti E. (1996) Sci. Am. 275:72-77.)
  • the ECM determines the structure and physical properties of connective tissue and is particularly important for morphogenesis and other processes associated with embryonic development and pattern formation.
  • the coUagens comprise a family of ECM proteins that provide structure to bone, teeth, skin, Hgaments, tendons, cartilage, blood vessels, and basement membranes. Multiple coUagen proteins have been identified. Three coUagen molecules fold together in a triple heHx stabiHzed by interchain disulfide bonds. Bundles of these triple heHces then associate to form fibrils. CoUagen primary structure consists of hundreds of (Gly-X-Y) repeats where about a third of the X and Y residues are Pro. Glycines are crucial to heHx formation as the bulkier amino acid sidechains cannot fold into the triple heHcal conformation. Because of these strict sequence requirements, mutations in coUagen genes have severe consequences.
  • Osteogenesis imperfecta patients have brittle bones that fracture easily; in severe cases patients die in utero or at birth.
  • Ehlers-Danlos syndrome patients have hyperelastic skin, hypermobile joints, and susceptibility to aortic and intestinal rupture.
  • Chondrodysplasia patients have short stature and ocular disorders.
  • Alport syndrome patients have hematuria, sensorineural deafness, and eye lens deformation. (Isselbacher, KJ. et al. (1994)
  • Elastin and related proteins confer elasticity to tissues such as skin, blood vessels, and lungs.
  • Elastin is a highly hydrophobic protein of about 750 amino acids that is rich in proline and glycine residues.
  • Elastin molecules are highly cross-linked, forming an extensive extraceUular network of fibers and sheets.
  • Elastin fibers are surrounded by a sheath of microfibrils which are composed of a number of glycoproteins, including fibrillin. Mutations in the gene encoding fibriUin are responsible for Marfan's syndrome, a genetic disorder characterized by defects in connective tissue. In severe cases, the aortas of afflicted individuals are prone to rupture. (Reviewed in Alberts, supra, pp. 984-986.)
  • Fibronectin is a large ECM glycoprotein found in aU vertebrates. Fibronectin exists as a dimer of two subunits, each containing about 2,500 amino acids. Each subunit folds into a rod-like structure containing multiple domains. The domains each contain multiple repeated modules, the most common of which is the type HI fibronectin repeat. The type HI fibronectin repeat is about 90 amino acids in length and is also found in other ECM proteins and in some plasma membrane and cytoplasmic proteins. Furthermore, some type Dl fibronectin repeats contain a characteristic t ⁇ peptide consistmg of Arginine-Glycine- Aspartic acid (RGD).
  • RGD Arginine-Glycine- Aspartic acid
  • the RGD sequence is recognized by the integrin family of ceU surface receptors and is also found in other ECM proteins. Disruption of both copies of the gene encoding fibronectin causes early embryonic lethaHty in mice. The mutant embryos display extensive morphological defects, including defects in the formation of the notochord, somites, heart, blood vessels, neural tube, and extraembryonic structures. (Reviewed in Alberts, supra, pp. 986-987.)
  • Laminin is a major glycoprotein component of the basal lamina which underlies and supports epitheHal ceU sheets.
  • Laminin is one of the first ECM proteins synthesized in the developing embryo.
  • Laminin is an 850 kilodalton protein composed of three polypeptide chains joined in the shape of a cross by disulfide bonds.
  • Laminin is especiaUy important for angiogenesis and in particular, for guiding the formation of capillaries. (Reviewed in Alberts, supra, pp. 990-991.)
  • proteoglycans are composed of unbranched polysaccharide chains (glycosaminoglycans) attached to protein cores. Common proteoglycans include aggrecan, betaglycan, decorin, perlecan, serglycin, and syndecan-1. Some of these molecules not only provide mechanical support, but also bind to extraceUular signaling molecules, such as fibroblast growth factor and transforming growth factor ⁇ , suggesting a role for proteoglycans in ceU-ceU communication and ceU growth. (Reviewed in Alberts, supra, pp.
  • glycoproteins tenascin-C and tenascin-R are expressed in developing and lesioned neural tissue and provide stimulatory and anti- adhesive (inhibitory) properties, respectively, for axonal growth. (Faissner, A. (1997) CeU Tissue Res. 290:331-341.)
  • the cytoskeleton is a cytoplasmic network of protein fibers that mediate ceU shape, structure, and movement.
  • the cytoskeleton supports the ceU membrane and forms tracks along which organeUes and other elements move in the cytosol.
  • the cytoskeleton is a dynamic structure that aUows ceUs to adopt various shapes and to carry out directed movements.
  • Major cytoskeletal fibers include the microtubules, the microfilaments, and the intermediate filaments.
  • the motor protein dynamin drives the formation of membrane vesicles. Accessory or associated proteins modify the structure or activity of the fibers while cytoskeletal membrane anchors connect the fibers to the ceU membrane.
  • TubuHns TubuHns
  • Microtubules cytoskeletal fibers with a diameter of about 24 nm, have multiple roles in the ceU. Bundles of microtubules form ciHa and flageUa, which are whip-like extensions of the ceU membrane that are necessary for sweeping materials across an epitheHum and for swimming of sperm, respectively. Marginal bands of microtubules in red blood ceUs and platelets are important for these ceUs' pHabiHty. OrganeUes, membrane vesicles, and proteins are transported in the ceU along tracks of microtubules. For example, microtubules run through nerve ceH axons, aUowing bidirectional transport of materials and membrane vesicles between the ceH body and the nerve terminal. Failure to supply the nerve terminal with these vesicles blocks the transmission of neural signals. Microtubules are also critical to chromosomal movement during ceU division. Both stable and short-Hved populations of microtubules exist in the ceU.
  • Microtubules are polymers of GTP-binding tubuHn protein subunits. Each subunit is a heterodimer of ⁇ - and ⁇ - tubulin, multiple isoforms of which exist.
  • the hydrolysis of GTP is linked to the addition of tubulin subunits at tihe end of a microtubule.
  • the subunits interact head to tail to form protofilaments; the protofilaments interact side to side to form a microtubule.
  • a microtubule is polarized, one end ringed with ⁇ -tubulin and the other with ⁇ -tubuHn, and the two ends differ in their rates of assembly.
  • each microtubule is composed of 13 protofilaments although 11 or 15 protofilament-microtubules are sometimes found.
  • CiHa and flageUa contain doublet microtubules.
  • Microtubules grow from speciaHzed structures known as centrosomes or microtubule-organizing centers (MTOCs). MTOCs may contain one or two centrioles, which are pinwheel arrays of triplet microtubules.
  • the basal body, the organizing center located at the base of a ciHum or flageUum contains one centriole.
  • Gamma tubuHn present in the MTOC is important for nucleating the polymerization of ⁇ - and ⁇ - tubuHn heterodimers but does not polymerize into microtubules.
  • Microtubule- Associated Proteins are important for nucleating the polymerization of ⁇ - and ⁇ - tubuHn heterodimers but does not polymerize into microtubules.
  • Microtubule-associated proteins have roles in the assembly and stabiHzation of microtubules.
  • assembly MAPs can be identified in neurons as weH as non-neuronal ceUs.
  • Assembly MAPs are responsible for cross-linking microtubules in the cytosol. These MAPs are organized into two domains: a basic microtubule-binding domain and an acidic projection domain. The projection domain is the binding site for membranes, intermediate filaments, or other microtubules. Based on sequence analysis, assembly MAPs can be further grouped into two types: Type I and Type H.
  • Type I MAPs which include MAPIA and MAPIB, are large, filamentous molecules that co-purify with microtubules and are abundantly expressed in brain and testes.
  • Type I MAPs contain several repeats of a positively-charged amino acid sequence motif that binds and embarkaHzes negatively charged tubuHn, leading to stabiHzation of microtubules.
  • MAPIA and MAPIB are each derived from a single precursor polypeptide that is subsequently proteolyticaUy processed to generate one heavy chain and one Hght chain.
  • LC3 Another Hght chain, LC3, is a 16.4 kDa molecule that binds MAPIA, MAPIB, and microtubules. It is suggested that LC3 is synthesized from a source other than the MAPIA or MAPIB transcripts, and that the expression of LC3 may be important in regulating the microtubule binding activity of MAPIA and MAPIB during ceU proHferation (Mann, S.S. et al. (1994) J. Biol.
  • Type H MAPs which include MAP2a, MAP2b, MAP2c, MAP4, and Tau, are characterized by three to four copies of an 18-residue sequence in the microtubule-binding domain.
  • MAP2a, MAP2b, and MAP2c are found only in dendrites
  • MAP4 is found in non-neuronal ceUs
  • Tau is found in axons and dendrites of nerve ceHs.
  • Alternative spHcing of the Tau mRNA leads to the existence of multiple forms of Tau protein.
  • Tau phosphorylation is altered in neurodegenerative disorders such as Alzheimer's disease, Pick's disease, progressive supranuclear palsy, corticobasal degeneration, and famiHal frontotemporal dementia and Parkinsonism linked to chromosome 17.
  • the altered Tau phosphorylation leads to a coUapse of the microtubule network and the formation of intraneuronal Tau aggregates (SpiUantini, M.G. and M. Goedert (1998) Trends Neurosci. 21:428-433).
  • the protein pericentrin is found in the MTOC and has a role in microtubule assembly. Actins
  • Microfilaments cytoskeletal filaments with a diameter of about 7-9 nm, are vital to ceU locomotion, ceU shape, ceU adhesion, ceU division, and muscle contraction. Assembly and disassembly of the microfilaments aUow ceUs to change their morphology.
  • Microfilaments are the polymerized form of actin, the most abundant intraceUular protein in the eukaryotic ceU. Human ceUs contain six isoforms of actin. The three ⁇ -actins are found in different kinds of muscle, nonmuscle ⁇ -actin and nonmuscle ⁇ -actin are found in nonmuscle ceUs, and another ⁇ -actin is found in intestinal smooth muscle ceUs.
  • G-actin the monomeric form of actin, polymerizes into polarized, heHcal F-actin filaments, accompanied by the hydrolysis of ATP to ADP.
  • Actin filaments associate to form bundles and networks, providing a framework to support the plasma membrane and determine ceU shape. These bundles and networks are connected to the ceU membrane.
  • muscle ceHs thin filaments containing actin sHde past thick filaments containing the motor protein myosin during contraction.
  • a family of actin-related proteins exist that are not part of the actin cytoskeleton, but rather associate with microtubules and dynein. Actin- Associated Proteins
  • Actin-associated proteins have roles in cross-Hnking, severing, and stabiHzation of actin filaments and in sequestering actin monomers. Several of the actin-associated proteins have multiple functions. Bundles and networks of actin filaments are held together by actin cross-Hnking proteins.
  • Group I cross-Hnking proteins have unique actin-binding domains and include the 30 kD protein, EF-la, fascin, and scruin.
  • Group H cross-Hnking proteins have a 7,000-MW actin-binding domain and include yilHn and dematin.
  • Group lH cross-linking protems have pairs of a
  • Severing proteins regulate the length of actin filaments by breaking them into short pieces or by blocking their ends. Severing proteins include gCAP39, severin (fragmin), gelsolin, and viUin. Capping proteins can cap the ends of actin filaments, but cannot break filaments. Capping proteins include CapZ and tropomodulin. The proteins thymosin and profilin sequester actin monomers in the cytosol, aHowing a pool of unpolymerized actin to exist. The actin-associated proteins tropomyosin, troponin, and caldesmon regulate muscle contraction in response to calcium.
  • Intermediate filaments are cytoskeletal fibers with a diameter of about 10 nm, intermediate between that of microfilaments and microtubules.
  • IFs serve structural roles in the ceU, reinforcing ceUs and organizing ceUs into tissues.
  • IFs are particularly abundant in epidermal ceUs and in neurons.
  • IFs are extremely stable, and, in contrast to microfilaments and microtubules, do not function in ceU motiHty.
  • Five types of IF proteins are known in mammals. Type I and Type H proteins are the acidic and basic keratins, respectively. Heterodimers of the acidic and basic keratins are the building blocks of keratin IFs.
  • Keratins are abundant in soft epitheHa such as skin and cornea, hard epitheHa such as nails and hair, and in epitheHa that Hne internal body cavities. Mutations in keratin genes lead to epitheHal diseases including epidermolysis buUosa simplex, buUous congenital ichthyosiform erythroderma (epidermolytic hyperkeratosis), non-epidermolytic and epidermolytic palmoplantar keratoderma, ichthyosis buUosa of Siemens, pachyonychia congenita, and white sponge nevus. Some of these diseases result in severe skinbHstering. (See, e.g., Wawersik, M. et al. (1997) J. Biol. Chem. 272:32557-32565; and Corden L.D. and W.H. McLean (1996) Exp. Dermatol. 5:297-307.)
  • Type HI IF proteins include des in, gHal fibriUary acidic protein, vimentin, and peripherin.
  • Desmin filaments in muscle ceUs link myofibrils into bundles and stabiHze sarcomeres in contractmg muscle.
  • GHal fibriUary acidic protein filaments are found in the gHal ceUs that surround neurons and asfrocytes.
  • Vimentin filaments are found in blood vessel endotheHal ceHs, some epitheHal ceUs, and mesenchymal ceUs such as fibroblasts, and are commonly associated with microtubules. Vimentin filaments may have roles in keeping the nucleus and other organeUes in place in the ceU.
  • Type JN IFs include the neurofilaments and nestin. Neurofilaments, composed of three polypeptides NF-L, NF-M, and NF-H, are frequently associated with microtubules in axons. Neurofilaments are responsible for the radial growth and diameter of an axon, and ultimately for the speed of nerve impulse transmission. Changes in phosphorylation and metaboHsm of neurofilaments are observed in neurodegenerative diseases including amyotrophic lateral sclerosis, Parkinson's disease, and Alzheimer's disease (JuHen, J.P. and W.E. Mushynski (1998) Prog. Nucleic Acid Res. Mol. Biol. 61:1-23). Type V IFs, the lamins, are found in the nucleus where they support the nuclear membrane.
  • IFs have a central ⁇ -heHcal rod region interrupted by short nonheHcal linker segments.
  • the rod region is bracketed, in most cases, by non-heHcal head and tail domains.
  • the rod regions of intermediate filament proteins associate to form a coiled-coil dimer.
  • a highly ordered assembly process leads from the dimers to the IFs. Neither ATP nor GTP is needed for IF assembly, unlike that of microfilaments and microtubules.
  • IF-associated proteins mediate the interactions of IFs with one another and with other ceU structures.
  • IFAPs cross-link IFs into a bundle, into a network, or to the plasma membrane, and may cross-link JFs to the microf ⁇ lament and microtubule cytoskeleton.
  • Microtubules and IFs are in particular closely associated.
  • IFAPs include BPAGl, plakoglobin, desmoplakin I, desmoplakin H, plectin, ankyrin, filaggrin, and lamin B receptor. Cytoskeletal-Membrane Anchors
  • Cytoskeletal fibers are attached to the plasma membrane by specific proteins. These attachments are important for maintaining ceU shape and for muscle contraction.
  • the spectrin-actin cytoskeleton is attached to ceU membrane by three proteins, band 4.1, ankyrin, and adducin. Defects in this attachment result in abnormaUy shaped ceUs which are more rapidly degraded by the spleen, leading to anemia.
  • the spectrin-actin cytoskeleton is also linked to the membrane by ankyrin; a second actin network is anchored to the membrane by filamin.
  • the protein dystrophin Hnks actin filaments to the plasma membrane; mutations in the dystrophin gene lead to Duchenne muscular dystrophy. In adherens junctions and adhesion plaques the peripheral membrane proteins ⁇ -actinin and vincuHn attach actin filaments to the ceH membrane.
  • IFs are also attached to membranes by cytoskeletal-membrane anchors.
  • the nuclear lamina is attached to the inner surface of the nuclear membrane by the lamin B receptor.
  • Vimentin IFs are attached to the plasma membrane by ankyrin and plectin.
  • Desmosome and hemidesmosome membrane junctions hold together epitheHal ceUs of organs and skin. These membrane junctions aUow shear forces to be distributed across the entire epitheHal ceU layer, thus providing strength and rigidity to the epitheHum.
  • IFs in epitheHal ceUs are attached to the desmosome by plakoglobin and desmoplakins. The proteins that link IFs to hemidesmosomes are not known.
  • Desmin JFs surround the sarcomere in muscle and are linked to the plasma membrane by paranemin, synemin, and ankyrin. Myosin-related Motor Proteins
  • Myosins are actin-activated ATPases, found in eukaryotic ceUs, that couple hydrolysis of ATP with motion. Myosin provides the motor function for muscle contraction and intraceUular movements such as phagocytosis and rearrangement of ceU contents during mitotic ceU division (cytokinesis).
  • the contractile unit of skeletal muscle termed the sarcomere, consists of highly ordered arrays of thin actin-containing filaments and thick myosin-containing filaments. Crossbridges form between the thick and thin filaments, and the ATP-dependent movement of myosin heads within the thick filaments puUs the thin filaments, shortening the sarcomere and thus the muscle fiber.
  • Myosins are composed of one or two heavy chains and associated Hght chains.
  • Myosin heavy chains contain an amino-terminal motor or head domain, a neck that is the site of Hght-chain binding, 5 and a carboxy-terminal tail domain.
  • the tail domains may associate to form an ⁇ -heHcal coiled coil.
  • Conventional myosins such as those found in muscle tissue, are composed of two myosin heavy-chain subunits, each associated with two Hght-chain subunits that bind at the neck region and play a regulatory role.
  • Unconventional myosins, beHeved to function in intraceUular motion may contain either one or two heavy chains and associated Hght chains. There is evidence for about 25 myosin l o heavy chain genes in vertebrates, more than half of them unconventional. Dynein-related Motor Proteins
  • Dyneins are (-) end-directed motor proteins which act on microtubules. Two classes of dyneins, cytosoHc and axonemal, have been identified. CytosoHc dyneins are responsible for translocation of materials along cytoplasmic microtubules, for example, transport from the nerve
  • Cytoplasmic dyneins are also reported to play a role in mitosis.
  • Axonemal dyneins are responsible for the beating of flageUa and ciHa.
  • Dynein on one microtubule doublet walks along the adjacent microtubule doublet. This sHding force produces bending forces that cause the flageUum or ciHum to beat.
  • Dyneins have a native mass between 1000 and 2000 kDa and contain either two or three force-producing heads driven
  • Kinesins are (+) end-directed motor proteins which act on microtubules.
  • the prototypical kinesin molecule is involved in the transport of membrane-bound vesicles and organeUes. This
  • Kinesin is also important in aU ceU types for the transport of vesicles from the Golgi complex to the endoplasmic reticulum. This role is critical for mamtaining the identity and functionaHty of these secretory organeUes.
  • Kinesins define a ubiquitous, conserved family of over 50 proteins that can be classified into at least 8 subfamilies based on primary amino acid sequence, domain structure, velocity of movement,
  • the prototypical kinesin molecule is a heterotetramer comprised of two heavy polypeptide chains (KHCs) and two Hght polypeptide chains (KLCs).
  • KHCs heavy polypeptide chains
  • KLCs Hght polypeptide chains
  • KHC subunits are typicaUy referred to as "kinesin.”
  • KHC is about 1000 amino acids in length
  • KLC is about 550 amino acids in length.
  • Two KHCs dimerize to form a rod-shaped
  • 35 molecule with three distinct regions of secondary structure At one end of the molecule is a globular motor domam a unc ons m y ro ysis an micro u u e in ing. inesin mo or omains are
  • KRPs caUed kinesin-related proteins
  • Dynarnin is a large GTPase motor protein that functions as a "molecular purchase,” generating a mechanochemical force used to sever membranes. This activity is important in forming clathrin- coated vesicles from coated pits in endocytosis and in the biogenesis of synaptic vesicles in neurons.
  • Binding of dynarnin to a membrane leads to dynarnin' s self-assembly into spirals that may act to constrict a flat membrane surface into a tubule.
  • GTP hydrolysis induces a change in conformation of the dynarnin polymer that pinches the membrane tubule, leading to severing of the membrane tubule and formation of a membrane vesicle.
  • Release of GDP and inorganic phosphate leads to dynarnin disassembly.
  • FoUowing disassembly the dynarnin may either dissociate from the membrane or remain associated to the vesicle and be transported to another region of the ceU.
  • dynarnin genes Three homologous dynarnin genes have been discovered, in addition to several dynamin-related proteins. conserveed dynarnin regions are the N-terminal GTP-binding domain, a central pleckstrin homology domain that binds membranes, a central coiled-coil region that may activate dynamin's GTPase activity, and a C- terminal proHne-rich domain that contains several motifs that bind SH3 domains on other proteins. Some dynamin-related proteins do not contain the pleckstrin homology domain or the proHne-rich domain. (See McNiven, M.A. (1998) CeU 94:151-154; Scaife, R.M. and RL. MargoHs (1997) CeU. Signal. 9:395-401.)
  • Ribosomal Molecules Ribosomal Molecules
  • Ribosomal RNAs are assembled, along with ribosomal proteins, into ribosomes, which are cytoplasmic particles that translate messenger RNA into polypeptides.
  • the eukaryotic ribosome is composed of a 60S (large) subunit and a 40S (smaU) subunit, which together form the 80S ribosome.
  • the ribosome also contains more than fifty proteins.
  • the ribosomal proteins have a prefix which denotes the subunit to which they belong, either 5 L (large) or S (smaU).
  • Ribosomal protein activities include binding rRNA and organizing the conformation of the junctions between rRNA heHces (Woodson, S.A. and N.B. Leontis (1998) Curr. Opin. Struct. Biol. 8:294-300; Ramakrishnan, V. and S.W. White (1998) Trends Biochem. Sci. 23:208- 212.)
  • Three important sites are identified on the ribosome.
  • the aminoacyl-tRNA site (A site) is where charged tRNAs (with the exception of the initiator-tRNA) bind on arrival at the ribosome.
  • the o peptidyl-tRNA site (P site) is where new peptide bonds are formed, as weH as where the initiator tRNA binds.
  • the exit site (E site) is where deacylated tRNAs bind prior to their release from the ribosome. (The ribosome is reviewed in Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY, pp. 888-908; and Lodish, H. et al. (1995) Molecular CeU Biology Scientific American Books, New York NY. pp. 119-138.) 5
  • chromatin The nuclear DNA of eukaryotes is organized into chromatin. Two types of chromatin are observed: euchromatin, some of which may be transcribed, and heterochromatin so densely packed that much of it is inaccessible to transcription. Chromatin packing thus serves to regulate protein 0 expression in eukaryotes. Bacteria lack chromatin and the chromatin-packing level of gene regulation.
  • the fundamental unit of chromatin is the nucleosome of 200 DNA base pairs associated with two copies each of histones H2A, H2B, H3, and H4. Adjascent nucleosomes are linked by another class of histones, HI.
  • Low molecular weight non-histone proteins caUed the high mobiHty group (HMG), associated with chromatin, may function in the unwinding of DNA and stabiHzation of single- 5 stranded DNA.
  • Chromodomain proteins function in compaction of chromatin into its transcriptionaUy silent heterochromatin form.
  • aU DNA is compacted into heterochromatin and transcription ceases. Transcription in interphase begins with the activation of a region of chromatin. Active chromatin is decondensed. Decondensation appears to be accompanied by changes in binding coefficient, o phosphorylation and acetylation states of chromatin histones. HMG proteins HMG13 and HMG17 selectively bind activated chromatin. Topoisomerases remove superheHcal tension on DNA. The activated region decondenses, aUowing gene regulatory proteins and transcription factors to assemble on the DNA.
  • Patterns of chromatin structure can be stably inherited, producing heritable patterns of gene 5 expression.
  • one of tihe two X chromosomes in each female ceU is inactivated by condensation to heterochromatin during zygote development.
  • the inactive state of this chromosome is inherited, so that adult females are mosaics of clusters of paternal-X and maternal-X clonal ceU groups.
  • the condensed X chromosome is reactivated in meiosis.
  • Chromatin is associated with disorders of protein expression such as thalassemia, a genetic anemia resulting from the removal of the locus control region (LCR) required for decondensation of the globin gene locus.
  • LCR locus control region
  • Electron carriers such as cytochromes accept electrons from NADH or FADH 2 and donate them to other electron carriers.
  • Adrenodoxin for example, is an FeS protein that forms a complex with NADPH:adrenodoxin reductase and cytochrome p450.
  • Cytochromes contain a heme prosthetic group, a porphyrin ring containing a tightly bound iron atom. Electron transfer reactions play a crucial role in ceUular energy production.
  • Glucose is initiaUy converted to pyruvate in the cytoplasm.
  • Fatty acids and pyruvate are transported to the mitochondria for complete oxidation to C0 2 coupled by enzymes to the transport of electrons from NADH and FADH ⁇ to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and P;.
  • Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydroHpoyl transacetylase, and dihydroHpoyl dehydrogenase.
  • Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccinylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase.
  • Acetyl CoA is oxidized to C0 2 with concomitant formation of NADH, FADH ⁇ , and GTP.
  • oxidative phosphorylation the transfer of electrons from NADH and FADHj to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and Pj by the F 0 F 1 ATPase complex in the mitochondrial inner membrane.
  • Enzyme complexes responsible for electron transport and ATP synthesis include the F ⁇ ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c 1; FeS protein, and cytochrome c oxidase.
  • ATP synthesis requires membrane transport enzymes including the phosphate transporter and the ATP- ADP antiport protein.
  • the ATP-binding casette (ABC) superfamily has also been suggested as belonging to the mitochondrial transport group (Hogue, D.L. et al. (1999) J. Mol. Biol. 285:379-389). Brown fat uncoupling protein dissipates oxidative energy as heat, and may be involved the fever response to infection and trauma (Cannon, B. et al. (1998) Ann. NY Acad. Sci. 856:171- 187).
  • Mitochondria are oval-shaped organeUes comprising an outer membrane, a tightly folded inner membrane, an intermembrane space between the outer and inner membranes, and a matrix inside the inner membrane.
  • the outer membrane contains many porin molecules that aUow ions and charged molecules to enter the intermembrane space, while the inner membrane contains a variety of transport proteins that transfer only selected molecules.
  • Mitochondria are the primary sites of energy production in ceUs. Mitochondria contain a smaU amount of DNA.
  • Human mitochondrial DNA encodes 13 proteins, 22 tRNAs, and 2 rRNAs.
  • Mitochondrial-DNA encoded proteins include NADH-Q reductase, a cytochrome reductase subunit, cytochrome oxidase subunits, and ATP synthase subunits.
  • Cytochrome b5 is a central electron donor for various reductive reactions occurring on the cytoplasmic surface of Hver endoplasmic reticulum. Cytochrome b5 has been found in Golgi, plasma, endoplasmic reticulum (ER), and microbody membranes.
  • mitochondrial proteins are encoded by nuclear genes, are synthesized on cytosoHc ribosomes, and are imported into the mitochondria.
  • Nuclear-encoded proteins which are destined for the mitochondrial matrix typically contain positively-charged amino terminal signal sequences. Import of these preproteins from the cytoplasm requires a multisubunit protein complex in the outer membrane known as the translocase of outer mitochondrial membrane (TOM; previously designated MOM; Pfanner, N. et al. (1996) Trends Biochem. Sci. 21:51-52) and at least three inner membrane proteins which comprise the translocase of inner mitochondrial membrane (TTM; previously designated MTM; Pfanner, supra). An inside-negative membrane potential across the inner mitochondrial membrane is also required for preprotein import.
  • TOM translocase of outer mitochondrial membrane
  • TTM translocase of inner mitochondrial membrane
  • Preproteins are recognized by surface receptor components of the TOM complex and are translocated through a proteinaceous pore formed by other TOM components. Proteins targeted to the matrix are then recognized by the import machinery of the TTM complex.
  • the import systems of the outer and inner membranes can function independently (Segui-Real, B. et al. (1993) EMBO J. 12:2211-2218).
  • leader peptide is cleaved by a signal peptidase to generate the mature protein.
  • signal peptidase Most leader peptides are removed in a one step process by a protease termed mitochondrial processing peptidase (MPP) (Paces, V. et al. (1993) Proc. Natl.
  • MPP mitochondrial intermediate peptidase
  • mitochondrial intermediate peptidase mitochondrial intermediate peptidase
  • mitochondrial dysfunction leads to impaired calcium buffering, generation of free radicals that may participate in deleterious intraceUular and extraceUular processes, changes in mitochondrial permeability and oxidative damage which is observed in several neurodegenerative diseases.
  • Neurodegenerative diseases linked to mitochondrial dysfunction include some forms of Alzheimer's disease, Friedreich's ataxia, famiHal amyotrophic lateral sclerosis, and Huntington's disease (Beal, 0 M.F. (1998) Biochim. Biophys.
  • MulticeUular organisms are comprised of diverse ceU types that differ dramaticaUy both in o structure and function.
  • the identity of a ceU is determined by its characteristic pattern of gene expression, and different ceU types express overlapping but distinctive sets of genes throughout development. Spatial and temporal regulation of gene expression is critical for the control of ceU proHferation, ceU differentiation, apoptosis, and other processes that contribute to organismal development.
  • gene expression is regulated in response to extraceUular signals that 5 mediate ceU-ceU communication and coordinate the activities of different ceU types. Appropriate gene regulation also ensures that ceUs function efficiently by expressing only those genes whose functions are required at a given time.
  • Transcriptional regulatory proteins are essential for the control of gene expression. Some of these proteins function as transcription factors that initiate, activate, repress, or terminate gene o transcription. Transcription factors generaUy bind to the promoter, enhancer, and upstream regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory elements within or downstream of a gene's coding region. Transcription factors may bind to a specific region of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes IV, Oxford University Press, New York NY, and CeU Press, Cambridge MA, pp. 554-570.) 5 The double heHx structure and repeated sequences of DNA create topological and chemical features wnich can be recognize y transcription actors.
  • ese ea ures are y rogen on onor and acceptor groups, hydrophobic patches, major and minor grooves, and regular, repeated stretches of sequence which induce distinct bends in the heHx.
  • transcription factors recognize specific DNA sequence motifs of about 20 nucleotides in length. Multiple, adjacent transcription 5 factor-binding motifs may be required for gene regulation.
  • DNA-binding structural motifs which comprise either a heHces or ⁇ sheets that bind to the major groove of DNA.
  • Four weH-characterized structural motifs are heHx-turn-heHx, zinc finger, leucine zipper, and heHx-loop-heHx. Proteins containing these motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA.
  • the heHx-turn-heHx motif consists of two ⁇ heHces connected at a fixed angle by a short chain of amino acids. One of the heHces binds to the major groove.
  • HeHx-turn-heHx motifs are exempHfied by the homeobox motif which is present in homeodomain proteins.
  • the Antennapedia and Ultrabithorax proteins of Drosophila 5 melanogaster are prototypical homeodomain proteins (Pabo, CO. and R.T. Sauer (1992) Annu. Rev. Biochem. 61:1053-1095).
  • the zinc finger motif which binds zinc ions, generaUy contains tandem repeats of about 30 amino acids consisting of periodicaUy spaced cysteine and histidine residues. Examples of this sequence pattern, designated C2H2 and C3HC4 ("RING" finger), have been described (Lewin, o supra).
  • Zinc finger proteins each contain an ⁇ heHx and an antiparaUel ⁇ sheet whose proximity and conformation are maintained by the zinc ion.
  • Contact with DNA is made by the arginine prece ding the ⁇ heHx and by the second, third, and sixth residues of the heHx.
  • Variants of the zinc finger motif include poorly defined cysteine-rich motifs which bind zinc or other metal ions.
  • the leucine zipper motif comprises a stretch of amino acids rich in leucine which can form an amphipathic heHx. This structure provides the basis for dimerization of two leucine zipper proteins. The region adjacent to the leucine zipper is usuaUy basic, and upon protein dimerization, is optimaUy positioned for binding to the major groove. Proteins containing such motifs are generaUy referred to as bZIP transcription factors.
  • the heHx-loop-heHx motif consists of a short heHx connected by a loop to a longer oc heHx. The loop is flexible and aHows the two heHces to fold back against each other and to bind to DNA.
  • the transcription factor Myc contains a prototypical HLH motif.
  • the immune system responds to infection or trauma by activating a cascade of events that coordinate the progressive selection, ampHfication, and mobiHzation of ceUular defense o mechanisms.
  • a complex and balanced program of gene activation and repression is involved in this process.
  • hyperactivity of tihe immune system as a result of improper or insufficient regulation of gene expression may result in considerable tissue or organ damage. This damage is weU documented in immunological responses associated with arthritis, aUergens, heart attack, stroke, and infections (Isselbacher, KJ. et al. (1996) Harrison's Principles of Internal Medicine, 13/e, McGraw 5 HiU, Inc. and Teton Data Systems Software).
  • Eukaryotic ceUs are surrounded by plasma membranes which enclose the ceU and maintain an environment inside the cell that is distinct from its surroundings.
  • eukaryotic organisms are o distinct from prokaryotes in possessing many intraceUular organeUe and vesicle structures. Many of the metaboHc reactions which distinguish eukaryotic biochemistry from prokaryotic biochemistry take place within these structures.
  • the plasma membrane and the membranes surrounding organeUes and vesicles are composed of phosphoglycerides, fatty acids, cholesterol, phosphoHpids, glycoHpids, proteoglycans, and proteins. These components confer identity and functionaHty to the membranes with which they associate.
  • TM proteins transmembrane proteins
  • TM domains are 5 typicaUy comprised of 15 to 25 hydrophobic amino acids which are predicted to adopt an ⁇ -heHcal conformation.
  • TM proteins are classified as bitopic (Types I and H) and polytopic (Types HI and IV) (Singer, S.J. (1990) Annu. Rev. CeU Biol. 6:247-296).
  • Bitopic proteins span the membrane once while polytopic proteins contain multiple membrane-spanning segments.
  • TM proteins function as ceU- surface receptors, receptor-interacting proteins, transporters of ions or metaboHtes, ion channels, ceU 0 anchoring proteins, and ceU type-specific surface antigens.
  • MPs membrane proteins
  • PDZ domains KDEL, RGD, NGR, and GSL sequence motifs
  • vWFA von WiUebrand factor A
  • EGF-Hke domains EGF-Hke domains.
  • RGD, NGR, and GSL motif-containing peptides have been used as drug deHvery agents in targeted 5 cancer treatment of tamor vasculature (Arap, W. et al. (1998) Science 279:377-380).
  • MPs may also contain amino acid sequence motifs, such as the carbohydrate recognition domain (CRD), that mediate interactions with extraceUular or intraceUular molecules.
  • CCD carbohydrate recognition domain
  • GPCR G-protein coupled receptors
  • GPCRs include receptors for biogenic amines, Hpid mediators of inflammation, peptide hormones, and sensory signal mediators.
  • the structure of these highly-conserved receptors consists of seven hydrophobic transmembrane regions, an extraceUular N-terminus, and a cytoplasmic C-terminus.
  • Three extraceUular loops alternate with three intraceUular loops to link the seven transmembrane regions. Cysteine disulfide bridges connect the second and 5 third extraceUular loops.
  • the most conserved regions of GPCRs are the transmembrane regions and the first two cytoplasmic loops.
  • a conserved, acidic- Arg-aromatic residue triplet present in the second cytoplasmic loop may interact with G proteins.
  • a GPCR consensus pattern is characteristic of most proteins belonging to this superfamily (ExPASy PROSITE document PS00237; and Watson, S. and S. ArkinstaU (1994) The G-protein Linked Receptor Facts Book, Academic Press, San Diego CA, o pp. 2-6). Mutations and changes in transcriptional activation of GPCR-encoding genes have been associated with neurological disorders such as schizophrenia, Parkinson's disease, Alzheimer's disease, drug addiction, and feeding disorders. Scavenger Receptors
  • Macrophage scavenger receptors with broad Hgand specificity may participate in the binding 5 of low density Hpoproteins (LDL) and foreign antigens.
  • Scavenger receptors types I and H are trimeric membrane proteins with each subunit containing a smaU N-terminal intraceUular domain, a transmembrane domain, a large extraceUular domain, and a C-terminal cysteine-rich domain.
  • the extraceUular domain contains a short spacer region, an ⁇ -heHcal coiled-coil region, and a triple heHcal coHagen-like region.
  • Hgands include chemicaUy modified Hpoproteins and albumin, polyribonucleotides, polysaccharides, phosphoHpids, and asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sci. USA 87:9133-9137; and Elomaa, O. et al. (1995) CeU 80:603-609).
  • the scavenger receptors are thought to play a key role in atherogenesis by mediating uptake of modified LDL in arterial waUs, and in host defense by binding bacterial endotoxins, bacteria, and protozoa. Tetraspan Family Proteins
  • the transmembrane 4 superfamily (TM4SF) or tetraspan family is a multigene family encoding type HI integral membrane proteins (Wright, M.D. and M.G. Tomlinson (1994) Immunol. Today 15:588-594).
  • the TM4SF is comprised of membrane proteins which traverse the ceU membrane four times.
  • Members of the TM4SF include platelet and endotheHal ceU membrane proteins, melanoma-associated antigens, leukocyte surface glycoproteins, colonal carcinoma antigens, tumor-associated antigens, and surface proteins of the schistosome parasites (Jankowski, S.A. (1994) Oncogene 9:1205-1211).
  • Members of the TM4SF share about 25-30% amino acid sequence identity with one another.
  • TM4SF members have been impHcated in signal transduction, control of ceU adhesion, regulation of ceU growth and proHferation, including development and oncogenesis, and ceU motiHty, including tumor ceU metastasis.
  • Expression of TM4SF proteins is associated with a variety of tumors and the level of expression maybe altered when ceUs are growing or activated.
  • Tumor antigens are ceU surface molecules that are differentiaUy expressed in tumor ceUs relative to normal ceUs. Tumor antigens distinguish tamor ceUs immunologicaUy from normal ceUs and provide diagnostic and therapeutic targets for human cancers (Takagi, S. et al. (1995) Int. J. Cancer 61:706-715; Liu, E. et al. (1992) Oncogene 7:1027-1032).
  • Leukocyte Antigens are ceU surface molecules that are differentiaUy expressed in tumor ceUs relative to normal ceUs. Tumor antigens distinguish tamor ceUs immunologicaUy from normal ceUs and provide diagnostic and therapeutic targets for human cancers (Takagi, S. et al. (1995) Int. J. Cancer 61:706-715; Liu, E. et al. (1992) Oncogene 7:1027-1032).
  • Leukocyte Antigens are ceU surface molecules that are differentiaUy expressed
  • ceU surface antigens include those identified on leukocytic ceUs of the immune system. These antigens have been identified using systematic, monoclonal antibody (mAb)-based
  • CD antigens have been characterized as both transmembrane proteins and ceU surface proteins anchored to the plasma membrane via covalent attachment to fatty acid-containing glycoHpids such as glycosylphosphatidylinositol (GPI).
  • GPI glycosylphosphatidylinositol
  • Ion channels are found in the plasma membranes of virtuaUy every cell in the body.
  • chloride channels mediate a variety of ceUular functions including regulation of membrane potentials and absorption and secretion of ions across epitheHal membranes.
  • Chloride channels also regulate the pH of organeUes such as the Golgi apparatus and endosomes (see, e.g., Greger, R. (1988) Annu. Rev. Physiol. 50:111-122).
  • Electrophysiological and pharmacological properties of chloride channels including ion conductance, current- voltage relationships, and sensitivity to modulators, suggest that different chloride channels exist in muscles, neurons, fibroblasts, epitheHal ceUs, and lymphocytes.
  • ion channels have sites for phosphorylation by one or more protein kinases including protein kinase A, protein kinase C, tyrosine kinase, and casein kinase H, aU of which regulate ion channel activity in ceUs.
  • protein kinase A protein kinase A
  • protein kinase C protein kinase C
  • tyrosine kinase tyrosine kinase
  • casein kinase H aU of which regulate ion channel activity in ceUs.
  • Inappropriate phosphorylation of proteins in ceUs has been linked to changes in ceU cycle progression and ceU differentiation. Changes in the ceU cycle have been linked to induction of apoptosis or cancer. Changes in ceU differentiation have been linked to diseases and disorders of the reproductive system, immune system, skeletal muscle, and other organ systems.
  • Proton ATPases comprise a large class of membrane proteins that use the energy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane. The resultant gradient may be used to transport other ions across the membrane (Na + , K + , or CI " ) or to maintain organeUe pH.
  • Proton ATPases are further subdivided into the mitochondrial F- ATPases, the plasma membrane ATPases, and the vacuolar ATPases.
  • the vacuolar ATPases estabHsh and maintain an acidic pH within various organeUes involved in the processes of endocytosis and exocytosis (MeUman, I. et al. (1986) Annu. Rev. Biochem.
  • Proton-coupled, 12 membrane-spanning domain transporters such as PEPT 1 and PEPT 2 are responsible for gastrointestinal absorption and for renal reabsorption of peptides using an electrochemical H + gradient as the driving force.
  • Another type of peptide transporter, the TAP transporter is a heterodimer consisting of TAP 1 and TAP 2 and is associated with antigen processing. Peptide antigens are transported across the membrane of the endoplasmic reticulum by TAP so they can be expressed on the ceU surface in association with MHC molecules.
  • Each TAP protem consists o mu tip e y rop o ic mem rane spann ng segments an a ig y conserve
  • ATP-binding cassette (BoU, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:284-289).
  • Pathogenic microorganisms such as herpes simplex virus, may encode inhibitors of TAP-mediated peptide transport in order to evade immune surveillance (Marusina, K. and JJ Manaco (1996) Curr. Opin. 5 Hematol. 3:19-26).
  • the ATP-binding cassette (ABC) transporters also caUed the "traffic ATPases", comprise a superfamily of membrane proteins that mediate transport and channel functions in prokaryotes and eukaryotes (Higgins, CF. (1992) Annu. Rev. CeU Biol. 8:67-113). ABC proteins share a similar o overall structure and significant sequence homology. AU ABC proteins contain a conserved domain of approximately two hundred amino acid residues which includes one or more nucleotide binding domains.
  • ABC transporter genes are associated with various disorders, such as hyperbiHrubinemia H/Dubin- Johnson syndrome, recessive Stargardt's disease, X-linked adrenoleukodystrophy, multidrug resistance, ceHac disease, and cystic fibrosis. 5 Peripheral and Anchored Membrane Proteins
  • Membrane anchors are covalently joined to a protein post-translationaUy and include such moieties as prenyl, myristyl, and glycosylphosphatidyl inositol groups.
  • Membrane locaHzation of peripheral and anchored o proteins is important for their function in processes such as receptor-mediated signal transduction. For example, prenylation of Ras is required for its locaHzation to the plasma membrane and for its normal and oncogenic functions in signal transduction.
  • InterceUular communication is essential for the development and survival of multiceUular 5 organisms.
  • CeUs communicate with one another through the secretion and uptake of protein signaling molecules.
  • the uptake of proteins into the ceH is achieved by the endocytic pathway, in which the interaction of extraceUular signaling molecules with plasma membrane receptors results in the formation of plasma membrane-derived vesicles that enclose and transport the molecules into the cytosol. These transport vesicles fuse with and mature into endosomal and lysosomal (digestive) o compartments.
  • the secretion of proteins from the ceU is achieved by exocytosis, in which molecules inside of the ceU proceed through the secretory pathway. In this pathway, molecules transit from the
  • vesicles form at the transitional endoplasmic reticulum 5 (tER), the rim of Golgi cisternae, the face of the Trans-Golgi Network (TGN), the plasma membrane (PM), and tubular extensions of the endosomes.
  • tER transitional endoplasmic reticulum 5
  • TGN Trans-Golgi Network
  • PM plasma membrane
  • tubular extensions of the endosomes vesicle formation occurs when a region of membrane buds off from the donor organeUe.
  • the membrane-bound vesicle contains proteins to be transported and is surrounded by a proteinaceous coat, the components of which are recruited from the cytosol. Two different classes of coat protein have been identified.
  • Clathrin coats form on 5 vesicles derived from the TGN and PM, whereas coatomer (COP) coats form on vesicles derived from the ER and Golgi.
  • COP coats can be further classified as COPI, involved in retrograde traffic through the Golgi and from the Golgi to the ER, and COPH, involved in anterograde traffic from the ER to the Golgi (MeUman, supra).
  • adapter proteins bring vesicle cargo and coat proteins
  • Adapter protein- 1 and -2 select cargo from the
  • TGN and plasma membrane respectively, based on molecular information encoded on the cytoplasmic tail of integral membrane cargo proteins.
  • Adapter proteins also recruit clathrin to the bud site.
  • Clathrin is a protein complex consisting of three large and three smaU polypeptide chains arranged in a three-legged structure caUed a triskeHon. Multiple triskeHons and other coat proteins appear to self-
  • This assembly process may serve to deform the membrane into a budding vesicle.
  • GTP-bound ADP-ribosylation factor (Arf) is also incorporated into the coated assembly.
  • Another smaU G-protein, dynarnin forms a ring complex around the neck of the forming vesicle and may provide the mechanochemical force to seal the bud, thereby releasing the vesicle.
  • the coated vesicle complex is then transported through the cytosol. During the transport
  • Arf-bound GTP is hydrolyzed to GDP, and the coat dissociates from the transport vesicle (West, M.A. et al. (1997) J. CeU Biol. 138:1239-1254).
  • COP coat protein
  • Coatomer is an equimolar complex of seven proteins, termed alpha-, beta-, beta'-, gamma-, delta-, epsilon- and zeta-COP.
  • the coatomer complex binds to dilysine motifs contained on the cytoplasmic tails of integral membrane proteins. These include the KKXX retrieval motif of membrane proteins of the ER and dibasic/diphenylamine motifs of members of the p24 family.
  • Eukaryotic ceUs are organized into various ceUular organeUes which has the effect of 35 separating specific molecules and their functions from one another and from the cytosol.
  • various membrane structures surround and define these organeUes while aUowing them to interact with one another and the ceU environment through both active and passive transport processes.
  • Important ceH organeUes include the nucleus, the Golgi apparatus, the endoplasmic reticulum, mitochondria, peroxisomes, lysosomes, endosomes, and secretory vesicles.
  • the ceU nucleus contains aH of the genetic information of the cell in the form of DNA, and the components and machinery necessary for repHcation of DNA and for transcription of DNA into RNA.
  • DNA is organized into compact structures in the nucleus by interactions with various DNA-binding proteins such as histones and non-histone chromosomal proteins.
  • DNA-specific nucleases, DNAses, partiaUy degrade these compacted structures prior to DNA repHcation or transcription.
  • DNA repHcation takes place with the aid of DNA heHcases which unwind the double-stranded DNA heHx, and DNA polymerases that dupHcate the separated DNA strands.
  • Transcriptional regulatory proteins are essential for the control of gene expression. Some of these proteins function as transcription factors that initiate,, activate, repress, or terminate gene transcription. Transcription factors generaUy bind to the promoter, enhancer, and upstream regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory elements within or downstream of a gene's coding region. Transcription factors may bind to a specific region of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes IN, Oxford University Press, New York NY, and CeU Press, Cambridge MA, pp. 554-570.) Many transcription factors incorporate DNA-binding structural motifs which comprise either ⁇ heHces or ⁇ sheets that bind to the major groove of DNA.
  • MaHgnant ceU growth may result from either excessive expression of tumor promoting genes or insufficient expression of tumor suppressor genes (Cleary, M.L. (1992) Cancer Surv. 15:89-104). Chromosomal translocations may also produce chimeric loci which fuse the coding sequence of one gene with the regulatory regions of a second unrelated gene. Such an arrangement likely results in inappropriate gene transcription, potentiaUy contributing to maHgnancy.
  • the immune system responds to infection or trauma by activating a cascade of events that coordinate the progressive selection, ampHfication, and mobilization of ceUular defense mechanisms.
  • a complex and balanced program of gene activation and repression is involved in this process.
  • hyperactivity of the immune system as a result of improper or insufficient regulation of gene expression may result in considerable tissue or organ damage. This damage is weU documented in immunological responses associated with arthritis, aUergens, heart attack, stroke, and infections (Isselbacher, KJ. et al. (1996) Harrison's Principles of Internal Medicine, 13/e, McGraw HiU, Inc. and Teton Data Systems Software).
  • RNA polymerase I makes large ribosomal RNAs
  • RNA polymerase HI makes a variety of small, stable RNAs including 5S ribosomal RNA and the transfer RNAs (tRNA).
  • RNA polymerase H transcribes genes that wiUbe translated into proteins.
  • the primary transcript of RNA polymerase H is caUed heterogenous nuclear RNA (hnRNA), and must be further processed by spHcing to remove non-coding sequences caUed introns.
  • RNA spHcing is mediated by smaU nuclear ribonucleoprotein complexes, or snRNPs, producing mature messenger RNA (mRNA) which is then transported out of the nucleus for translation into proteins.
  • mRNA messenger RNA
  • the nucleolus is a highly organized subcompartment in the nucleus that contains high concentrations of RNA and proteins and functions mainly in ribosomal RNA synthesis and assembly 5 (Alberts, et al. supra, pp. 379-382).
  • Ribosomal RNA is a structural RNA that is complexed with proteins to form ribonucleoprotein structures caUed ribosomes. Ribosomes provide the platform on which protein synthesis takes place.
  • Ribosomes are assembled in the nucleolus initiaUy from a large, 45S rRNA combined with a variety of proteins imported from the cytoplasm, as weU as smaUer, 5S rRNAs. Later processing of 0 the immature ribosome results in formation of smaUer ribosomal subunits which are transported from the nucleolus to the cytoplasm where they are assembled into functional ribosomes. Endoplasmic Reticulum
  • proteins are synthesized within the endoplasmic reticulum (ER), deHvered from the ER to the Golgi apparatus for post-translational processing and sorting, and transported from the 5 Golgi to specific intraceUular and extraceUular destinations. Synthesis of integral membrane proteins, secreted proteins, and proteins destined for the lumen of a particular organeUe occurs on the rough endoplasmic reticulum (ER).
  • the rough ER is so named because of the rough appearance in electron micrographs imparted by the attached ribosomes on which protein synthesis proceeds.
  • Synthesis of proteins destined for the ER actuaHy begins in the cytosol with the synthesis of a specific signal o peptide which directs the growing polypeptide and its attached ribosome to the ER membrane where the signal peptide is removed and protein synthesis is completed.
  • Soluble proteins destined for the ER lumen, for secretion, or for transport to the lumen of other organeUes pass completely into the ER lumen.
  • Transmembrane proteins destined for the ER or for other ceU membranes are translocated across the ER membrane but remain anchored in the Hpid bilayer of the membrane by one or more 5 membrane-spanning ⁇ -heHcal regions.
  • Translocated polypeptide chains destined for other organeUes or for secretion also fold and assemble in the ER lumen with the aid of certain "resident" ER proteins. Protein folding in the ER is aided by two principal types of protein isomerases, protein disulfide isomerase (PDI), and peptidyl- prolyl isomerase (PPI).
  • PDI protein disulfide isomerase
  • PPI peptidyl- prolyl isomerase
  • PPI an enzyme that catalyzes the isomerization of certain proline imide bonds in oHgopeptides and proteins, is considered to govern one of the rate limiting steps in the folding of many proteins to their final functional conformation.
  • the cyclophilins represent a major class of PPI that was originaUy identified as the major receptor for the immunosuppressive drug cyclosporin A (Handschumacher, R.E. et al. (1984) Science 226:544-547).
  • o Molecular "chaperones" such as BiP (binding protein) in the ER recognize incorrectly folded proteins as weU as proteins not yet folded into their final form and bind to them, both to prevent improper aggregation between them, and to promote proper folding.
  • the Golgi apparatus is a complex structure that Hes adjacent to the ER in eukaryotic ceHs and serves primarily as a sorting and dispatching station for products of the ER (Alberts, et al. supra, pp. 600-610). Additional posttranslational processing, principaHy additional glycosylation, also occurs in o the Golgi. Indeed, the Golgi is a major site of carbohydrate synthesis, including most of the glycosaminoglycans of the extraceUular matrix. N-Hnked oHgosaccharides, added to proteins in the ER, are also further modified in the Golgi by the addition of more sugar residues to form complex N- linked oHgosaccharides.
  • the terminal compartment of the Golgi is the Trans-Golgi Network (TGN), where both membrane and lumenal proteins are sorted for their final destination.
  • TGN Trans-Golgi Network
  • Other transport vesicles bud off containing proteins destined for the plasma membrane, such as receptors, adhesion molecules, and ion channels, and secretory proteins, such as hormones, neurotransmitters, and 5 digestive enzymes.
  • the vacuole system is a coUection of membrane bound compartments in eukaryotic ceUs that functions in the processes of endocytosis and exocytosis. They include phagosomes, lysosomes, endosomes, and secretory vesicles.
  • Endocytosis is the process in ceUs of internaUzing nutrients, solutes or smaU particles (pinocytosis) or large particles such as internaHzed receptors, viruses, bacteria, or bacterial toxins (phagocytosis).
  • Exocytosis is the process of transporting molecules to the ceU surface. It faciHtates placement or locaHzation of membrane-bound receptors or other membrane proteins and secretion of hormones, neurotransmitters, digestive enzymes, wastes, etc.
  • a common property of aU of these vacuoles is an acidic pH environment ranging from approximately pH 4.5-5.0. This acidity is maintained by the presence of a proton ATPase that uses the energy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane (MeUman, I. et al. (1986) Annu. Rev. Biochem. 55:663-700).
  • Eukaryotic vacuolar proton ATPase (vp-ATPase) is a multimeric enzyme composed of 3-10 different subunits.
  • One of these subunits is a highly hydrophobic polypeptide of approximately 16 kDa that is similar to the proteoHpid component of vp-ATPases from eubacteria, fungi, and plant vacuoles (Mandel, M. et al. (1988) Proc. Natl. Acad. Sci. USA 85:5521-5524).
  • the 16 kDa proteoHpid component is the major subunit of the membrane portion of vp-ATPase and functions in the transport of protons across the membrane. Lysosomes
  • Lysosomes are membranous vesicles containing various hydrolytic enzymes used for the controUed intraceUular digestion of macromolecules. Lysosomes contain some 40 types of enzymes including proteases, nucleases, glycosidases, Hpases, phosphoHpases, phosphatases, and sulfatases, aU of which are acid hydrolases that function at a pH of about 5. Lysosomes are surrounded by a unique membrane containing transport proteins that aUow the final products of macromolecule degradation, such as sugars, amino acids, and nucleotides, to be transported to the cytosol where they may be either excreted or reutiHzed by the ceU. A vp-ATPase, such as that described above, maintains the acidic environment necessary for hydrolytic activity (Alberts, supra, pp. 610-611). Endosomes
  • Endosomes are another type of acidic vacuole that is used to transport substances from the ceU surface to the interior of the ceU in the process of endocytosis. Like lysosomes, endosomes have an acidic environment provided by a vp-ATPase (Alberts et al. supra, pp. 610-618). Two types of endosomes are apparent based on tracer uptake studies that distinguish their time of formation in the ceU and their ceUular location. Early endosomes are found near the plasma membrane and appear to function primarily in the recycling of internaHzed receptors back to the ceH surface.
  • Late endosomes appear later in the endocytic process close to the Golgi apparatus and the nucleus, and appear to be associated with deHvery of endocytosed material to lysosomes or to the TGN where they may be recyc e .
  • pec c prote ns are assoc ate w t part cu ar transport ves c es an e r target compartments that may provide selectivity in targeting vesicles to their proper compartments.
  • a cytosoHc prenylated GTP-binding protein, Rab is one such protein. Rabs 4, 5, and 11 are associated with the early endosome, whereas Rabs 7 and 9 associate with the late endosome.
  • Mitochondria are oval-shaped organeUes comprising an outer membrane, a tightly folded inner membrane, an intermembrane space between the outer and inner membranes, and a matrix inside the inner membrane.
  • the outer membrane contains many porin molecules that aUow ions and charged molecules to enter the intermembrane space, while the inner membrane contains a variety of transport proteins that transfer only selected molecules.
  • Mitochondria are the primary sites of energy production in ceUs.
  • Glucose is initiaUy converted to pyruvate in the cytoplasm.
  • Fatty acids and pyruvate are transported to the mitochondria for complete oxidation to C0 2 coupled by enzymes to the transport of electrons from NADH and FADH j to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and P ⁇
  • Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydroHpoyl transacetylase, and dihydroHpoyl dehydrogenase.
  • Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccinylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase.
  • Acetyl CoA is oxidized to C0 2 with concomitant formation of NADH, FADH ⁇ , and GTP.
  • oxidative phosphorylation the transfer of electrons from NADH and FADH 2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and P j by the F ⁇ _ ATPase complex in the mitochondrial inner membrane.
  • Enzyme complexes responsible for electron transport and ATP synthesis include the F f 7 ! ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c FeS protein, and cytochrome c oxidase.
  • Peroxisomes include the F f 7 ! ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome
  • Peroxisomes like mitochondria, are a major site of oxygen utilization. They contain one or more enzymes, such as catalase and urate oxidase, that use molecular oxygen to remove hydrogen atoms from specific organic substrates in an oxidative reaction that produces hydrogen peroxide
  • Catalase oxidizes a variety of substrates including phenols, formic acid, formaldehyde, and alcohol and is important in peroxisomes of Hver and kidney ceUs for detoxifying various toxic molecules that enter the bloodstream.
  • Another major function of oxidative reactions in peroxisomes is the breakdown of fatty acids in a process caUed ⁇ oxidation, ⁇ oxidation results in shortening of the alkyl chain of fatty acids by blocks of two carbon atoms that are converted to acetyl
  • peroxisomes import their proteins from the cytosol using a specific signal sequence located near the C-terminus of the protein.
  • the importance of this import process is evident in the inherited human disease ZeHweger syndrome, in which a defect in importing proteins into perixosomes leads to a perixosomal deficiency resulting in severe abnormaHties in the brain, Hver, and kidneys, and death soon after birth.
  • One form of this disease has been shown to be due to a mutation in the gene encoding a perixosomal integral membrane protein caUed peroxisome assembly factor- 1.
  • the discovery of new human molecules satisfies a need in the art by providing new compositions which are useful in the diagnosis, study, prevention, and treatment of diseases associated with, as weU as effects of exogenous compounds on, the expression of human molecules.
  • the present invention relates to nucleic acid sequences comprising human diagnostic and therapeutic polynucleotides (dithp) as presented in the Sequence Listing.
  • the dithp uniquely identify genes encoding human structural, functional, and regulatory molecules.
  • the invention provides an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56.
  • the polynucleotide comprises at least 30 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide comprising a polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of
  • the polynucleotide comprises at least 60 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide comprising a polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the invention further provides a composition for the detection of expression of human diagnostic and therapeutic polynucleotides comprising at least one isolated polynucleotide comprising a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d); and a detectable label.
  • the invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a polynucleotide sequence of a polyneucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of SEQ ID NO:l-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the method comprises a) ampHfying said target polynucleotide or fragment thereof using polymerase chain reaction ampHfication, and b) detecting the presence or absence of said ampHfied target polynucleotide or fragment thereof, and, optionaUy, if present, the amount thereof.
  • the invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a polynucleotide sequence of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the method comprises a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specificaUy hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide, and b) detecting the presence or absence of said hybridization complex, and, optionaUy, if present, the amount thereof.
  • the invention provides a composition comprising a target polynucleotide of the method, wherein said probe comprises at least 30 contiguous nucleotides.
  • the invention provides a composition comprising a target polynucleotide of the method, wherein said probe comprises at least 60 contiguous nucleotides.
  • the invention further provides a recombinant polynucleotide comprising a promoter sequence operably linked to an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; b) a 5 polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the invention provides a ceU transformed with the recombinant polynucleotide.
  • the invention provides a o trans
  • the invention also provides a method for producing a human diagnostic and therapeutic polypeptide, the method comprising a) culturing a ceU under conditions suitable for expression of the human diagnostic and therapeutic polypeptide, wherein said ceU is transformed with a recombinant polynucleotide, said recombinant polynucleotide comprising an isolated polynucleotide selected from 5 the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; ii) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO.1-56; iii) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynu
  • the invention also provides an isolated human diagnostic and therapeutic polypeptide (DITHP) encoded by at least one polynucleotide comprising a polynucleotide sequence selected from 5 the group consisting of SEQ ID NO:l-56.
  • DITHP diagnostic and therapeutic polypeptide
  • the invention further provides a method of screening for a test compound that specificaUy binds to the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113.
  • the method comprises a) combining the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113 with at least one test compound under suitable conditions, and b) detecting binding of the polypeptide having an o amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13 to the test compound, thereby identifying a compound that specificaUy binds to the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13.
  • the invention further provides a microarray wherein at least one element of the microarray is an isolated polynucleotide comprising at least 30 contiguous nucleotides of a polynucleotide selected 5 from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from me group consistmg ol S ⁇ ID JNU:l-5b; b) a polynucleotide comprismg a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d).
  • the invention also provides a method for generating a transcript image of a sample which contains polynucleotides.
  • the method comprises a) labeling the polynucleotides of the sample, b) contacting the elements of the microarray with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, and c) quantifying the expression of the polynucleotides in the sample.
  • the invention provides a method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ JJD NO:l-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence 5 selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d).
  • a target polynucleotide comprises a polynucleotide selected from
  • the method comprises a) exposing a sample comprising the target polynucleotide to a compound, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying amounts of the o compound and in the absence of the compound.
  • the invention further provides a method for assessing toxicity of a test compound, said method comprising a) treating a biological sample containing nucleic acids with the test compound; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide selected from the group consisting of i) a polynucleotide 5 comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; ii) a polynucleotide comprising a nataraUy occu ⁇ ing polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56; Hi) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynucleotide of n); and v) an
  • Hybridization occurs under conditions whereby a o specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; H) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from 5 the group consisting of SEQ ID NO: 1-56; iii) a polynucleotide complementary to the polj ⁇ iucleotide of i) ; iv) a polynucleotide complementary to the polynucleotide of ii) ; and v) an RNA equivalent of i) through
  • the invention further provides an isolated polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13, b) a polypeptide comprising a nataraUy occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ J D NO:57-113, c) a biologicaHy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113.
  • the invention provides an isolated polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113.
  • the invention further provides an isolated polynucleotide encoding a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a polypeptide comprising a nataraUy occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, c) a biologicaHy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13 , and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113.
  • the polynucleotide encodes a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13. ha another alternative, the polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56.
  • the invention provides an isolated antibody which specificaUy binds to a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13, b) a polypeptide comprising a nataraUy occu ⁇ ing amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13 , c) a biologicaHy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113.
  • the invention further provides a composition comprising a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a polypeptide comprising a nataraUy occu ⁇ ing amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, c) a biologicaUy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and a pharmaceuticaUy acceptable excipient.
  • the composition comprises a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113.
  • the invention additionaUy provides a method of treating a disease or condition associated with decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
  • the invention also provides a method for screening a compound for effectiveness as an agonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a polypeptide comprising a nataraUy occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, c) abiologicaUy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13.
  • the method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting agonist activity in the sample.
  • the invention provides a composition comprising an agonist compound identified by the method and a pharmaceuticaUy acceptable excipient.
  • the invention provides a method of treating a disease or condition associated with decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
  • the invention provides a method for screening a compound for effectiveness as an antagonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a polypeptide comprising a nataraUy occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consistmg of SEQ ID NO:57-l 13, c) a biologicaUy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO:57- 113, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13.
  • the method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting antagonist activity in the sample.
  • the invention provides a composition comprising an antagonist compound identified by the method and a pharmaceuticaUy acceptable excipient.
  • the invention provides a method of treating a disease or condition associated with overexpression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
  • the invention further provides a method of screening for a compound that modulates the activity of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a polypeptide comprising a nataraUy occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, c) a biologicaUy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113.
  • the method comprises a) combining the polypeptide with at least one test compound under conditions permissive for the activity of the polypeptide, b) assessing the activity of the polypeptide in the presence of the test compound, and c) comparing the activity of the polypeptide in the presence of the test compound with the activity of the polypeptide in the absence of the test compound, wherein a change in the activity of the polypeptide in the presence of the test compound is indicative of a compound that modulates the activity of the polypeptide.
  • Table 1 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) conesponding to the polynucleotides of the present invention, along with the sequence identification numbers (SEQ ID NO:s) and open reading frame identification numbers (ORF IDs) corresponding to polypeptides encoded by the template ID.
  • Table 2 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) conesponding to the polynucleotides of the present invention, along with their GenBank hits (GI Numbers), probabiHty scores, and functional annotations conesponding to the GenBank hits.
  • Table 3 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) conesponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated “start” and “stop” nucleotide positions. The reading frames of the polynucleotide segments and the Pfam hits, Pfam descriptions, and E-values conesponding to the polypeptide domains encoded by the polynucleotide segments are indicated.
  • Table 4 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) conesponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated “start” and “stop” nucleotide positions.
  • the reading frames of the polynucleotide segments are shown, and the polypeptides encoded by the polynucleotide segments constitute either signal peptide (SP) or transmembrane (TM) domains, as indicated.
  • SP signal peptide
  • TM transmembrane
  • the membrane topology of the encoded polypeptide sequence is indicated as being transmembrane or on the cytosoHc or non-cytosoHc side of the ceU membrane or organeUe.
  • Table 5 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) conesponding to the polynucleotides of the present invention, along with component sequence identification numbers (component IDs) conesponding to each template.
  • the component sequences, which were used to assemble the template sequences, are defined by the indicated “start” and "stop” nucleotide positions along each template.
  • Table 6 shows the tissue distribution profiles for the templates of the invention.
  • Table 7 shows the sequence identification numbers (SEQ ID NO:s) conesponding to the polypeptides of the present invention, along with the reading frames used to obtain the polypeptide segments, the lengths of the polypeptide segments, the "start” and “stop” nucleotide positions of the polynucleotide sequences used to define the encoded polypeptide segments, the GenBank hits (GI Numbers), probabiHty scores, and functional annotations corresponding to the GenBank hits.
  • Table 8 summarizes the bioinformatics tools which are useful for analysis of the polynucleotides of the present invention.
  • the first column of Table 8 Hsts analytical tools, programs, and algorithms
  • the second column provides brief descriptions thereof
  • the third column presents appropriate references, aU of which are incorporated by reference herein in their entirety
  • the fourth column presents, where appHcable, the scores, probabiHty values, and other parameters used to evaluate the strength of a match between two sequences (the higher the score, the greater the homology between two sequences).
  • dithp refers to a nucleic acid sequence
  • DITHP refers to an amino acid sequence encoded by dithp.
  • a "fuU-length" dithp refers to a nucleic acid sequence containing die entire coding region of a gene endogenously expressed in human tissue.
  • Adjuvants are materials such as Freund's adjuvant, mineral gels (aluminum hydroxide), and surface active substances (lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol) which may be administered to increase a host's immunological response.
  • mineral gels aluminum hydroxide
  • surface active substances lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol
  • Allele refers to an alternative form of a nucleic acid sequence. AUeles result from a "mutation," a change or an alternative reading of the genetic code. Any given gene may have none, one, or many aHeHc forms. Mutations which give rise to aUeles include deletions, additions, or substitutions of nucleotides. Each of these changes may occur alone, or in combination with the others, one or more times in a given nucleic acid sequence.
  • the present invention encompasses aUeHc dithp.
  • AUeHc variant is an alternative form of the gene encoding DITHP.
  • AHeHc variants may result from at least one mutation in the nucleic acid sequence and may result in altered mRNAs or in polypeptides whose structure or function may or may not be altered.
  • a gene may have none, one, or many aHeHc variants of its nataraUy occurring form.
  • Common mutational changes which give rise to aUeHc variants are generaUy ascribed to natural deletions, additions, or substitutions of nucleotides. Each of these types of changes may occur alone, or in combination with the others, one or more times in a given sequence.
  • altered nucleic acid sequences encoding DITHP include those sequences with deletions, insertions, or substitutions of different nucleotides, resulting in a polypeptide the same as DITHP or a polypeptide with at least one functional characteristic of DITHP. Included within this definition are polymorphisms which may or may not be readily detectable using a particular oHgonucleotide probe of the polynucleotide encoding DITHP, and improper or unexpected hybridization to aHeHc variants, with a locus other than the normal chromosomal locus for the polynucleotide sequence encoding DITHP.
  • the encoded protein may also be "altered,” and may contain deletions, insertions, or substitutions of amino acid residues which produce a silent change and result in a functionaUy equivalent DITHP.
  • DeHberate amino acid substitutions maybe made on the basis of similarity in polarity, charge, solubiHty, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues, as long as the biological or immunological activity of DITHP is retained.
  • negatively charged amino acids may include aspartic acid and glutamic acid
  • positively charged amino acids may include lysine and arginine.
  • Amino acids with uncharged polar side chains having similar hydrophilicity values may include: asparagine and glutamine; and serine and threonine.
  • Amino acids with uncharged side chains having similar hydrophiHcity values may include: leucine, isoleucine, and valine; glycine and alanine; and phenylalanine and tyrosine.
  • amino acid sequence refers to a peptide, a polypeptide, or a protein of either natural or synthetic origin.
  • the amino acid sequence is not limited to the complete, endogenous amino acid sequence and may be a fragment, epitope, variant, or derivative of a protein expressed by a nucleic acid sequence.
  • Aminogenous amino acid sequence refers to the production of additional copies of a sequence and is carried out using polymerase chain reaction (PCR) technologies weU known in the art.
  • Antibody refers to intact molecules as weU as to fragments thereof, such as Fab, F(ab') 2 , and Fv fragments, which are capable of binding the epitopic determinant.
  • Antibodies that bind DITHP polypeptides can be prepared using intact polypeptides or using fragments containing smaU peptides of interest as the immunizing antigen.
  • the polypeptide or peptide used to immunize an animal e.g., a mouse, a rat, or a rabbit
  • an animal e.g., a mouse, a rat, or a rabbit
  • RNA e.g., a mouse, a rat, or a rabbit
  • chemicaUy coupled to peptides include bovine serum albumin, thyroglobulin, and keyhole limpet hemocyanin (KLH).
  • KLH keyhole limpet hemocyanin
  • the coupled peptide is then used to immunize the animal.
  • the term "aptamer” refers to a nucleic acid or oHgonucleotide molecule that binds to a specific molecular target. Aptamers are derived from an in vitro evolutionary process (e.g., SELEX (Systematic Evolution of Ligands by Exponential Enrichment), described in U.S. Patent No. 5,270,163), which selects for target-specific aptamer sequences from large combinatorial Hbraries.
  • Aptamer compositions may be double-stranded or single-stranded, and may include deoxyribonucleotides, ribonucleotides, nucleotide derivatives, or other nucleotide-Hke molecules.
  • the nucleotide components of an aptamer may have modified sugar groups (e.g., the 2'-OH group of a ribonucleotide may be replaced by 2'-F or 2 -NH 2 ), which may improve a desired property, e.g., resistance to nucleases or longer Hfetime in blood.
  • Aptamers may be conjugated to other molecules, e.g., a high molecular weight carrier to slow clearance of the aptamer from the circulatory system.
  • Aptamers may be specificaUy cross-linked to their cognate Hgands, e.g., by photo-activation of a cross-linker. (See, e.g., Brody, E.N. and L. Gold (2000) J. Biotechnol. 74:5-13.)
  • RNA aptamer refers to an aptamer which is expressed in vivo.
  • a vaccinia virus-based RNA expression system has been used to express specific RNA aptamers at high levels in the cytoplasm of leukocytes (Blind, M. et al. (1999) Proc. Natl Acad. Sci. USA 96:3606-3610).
  • spiegelmer refers to an aptamer which includes L-DNA, L-RNA, or other left- handed nucleotide derivatives or nucleotide-Hke molecules. Aptamers containing left-handed nucleotides are resistant to degradation by nataraUy occuning enzymes, which normaUy act on substrates containing right-handed nucleotides.
  • Antisense sequence refers to a sequence capable of specificaUy hybridizing to a target sequence.
  • the antisense sequence may include DNA, RNA, or any nucleic acid mimic or analog such as peptide nucleic acid (PNA); oHgonucleotides having modified backbone linkages such as phosphorothioates, methylphosphonates, orbenzylphosphonates; oHgonucleotides having modified sugar groups such as 2'-methoxyethyl sugars or 2'-methoxyethoxy sugars; or oHgonucleotides having modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine.
  • Antisense technology refers to any technology which reHes on the specific hybridization of an antisense sequence to a target sequence.
  • a "bin” is a portion of computer memory space used by a computer program for storage of data, and bounded in such a manner that data stored in a bin may be retrieved by the program.
  • BiologicaUy active refers to an amino acid sequence having a structural, regulatory, or biochemical function of a nataraUy occuning amino acid sequence.
  • “Clone joining” is a process for combining gene bins based upon the bins' containing sequence information from the same clone.
  • the sequences may assemble into a primary gene transcript as weU as one or more spHce variants.
  • “Complementary” describes the relationship between two single-stranded nucleic acid sequences that anneal by base-pairing (5 -A-G-T-3' pairs with its complement 3'-T-C-A-5').
  • a “component sequence” is a nucleic acid sequence selected by a computer program such as PHRED and used to assemble a consensus or template sequence from one or more component sequences.
  • a "consensus sequence” or “template sequence” is a nucleic acid sequence which has been assembled from overlapping sequences, using a computer program for fragment assembly such as the GELVIEW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a relational database management system (RDMS).
  • GELVIEW fragment assembly system Genetics Computer Group (GCG), Madison WI
  • RDMS relational database management system
  • Constant amino acid substitutions are those substitations that, when made, least interfere with the properties of the original protein, i.e., the structare and especiaUy the function of the protein is conserved and not significantly changed by such substitutions.
  • the table below shows amino acids wliich may be substituted for an original amino acid in a protein and which are regarded as conservative substitutions.
  • Conservative substitations generaUy maintain (a) the structare of the polypeptide backbone in the area of the substitution, for example, as a beta sheet or alpha heHcal conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain.
  • “Deletion” refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or amino acid residue, respectively, is absent.
  • Derivative refers to the chemical modification of a nucleic acid sequence, such as by replacement of hydrogen by an alkyl, acyl, amino, hydroxyl, or other group.
  • “Differential expression” refers to increased or upregulated; or decreased, downregulated, or absent gene or protein expression, determined by comparing at least two different samples. Such comparisons maybe carried out between, for example, a treated and an untreated sample, or a diseased and a normal sample.
  • element and “anay element” refer to a polynucleotide, polypeptide, or other chemical compound having a unique and defined position on a microanay.
  • modulate refers to a change in the activity of DITHP.
  • modulation may cause an increase or a decrease in protein activity, binding characteristics, or any other biological, functional, or immunological properties of DITHP.
  • E- value refers to die statistical probabiHty that a match between two sequences occuned by chance.
  • xon s u ing re ers o e recom ina on o i eren co ing reg ons exons . ince an exon may represent a structural or functional domain of the encoded protein, new proteins may be assembled through the novel reassortment of stable substructures, thus aUowing acceleration of the evolution of new protein functions.
  • a "fragment” is a unique portion of dithp or DITHP which is identical in sequence to but shorter in length than the parent sequence. A fragment may comprise up to the entire length of die defined sequence, minus one nucleotide/amino acid residue.
  • a fragment may comprise from 10 to 1000 contiguous amino acid residues or nucleotides.
  • a fragment used as a probe, primer, antigen, therapeutic molecule, or for other purposes maybe at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 60, o 75, 100, 150, 250 or at least 500 contiguous amino acid residues or nucleotides in length. Fragments maybe preferentiaUy selected from certain regions of a molecule.
  • a polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first 250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a certain defined sequence. Clearly these lengths are exemplary, and any length that is supported by the specification, including the 5 Sequence Listing and the figures, may be encompassed by the present embodiments.
  • a fragment of dithp comprises a region of unique polynucleotide sequence that specificaUy identifies dithp, for example, as distinct from any other sequence in the same genome.
  • a fragment of dithp is useful, for example, in hybridization and ampHfication technologies and in analogous methods that distinguish dithp from related polynucleotide sequences.
  • the precise length of a fragment of dithp o and the region of dithp to which the fragment corresponds are routinely deter inable by one of ordinary skiU in the art based on the intended purpose for the fragment.
  • a fragment of DITHP is encoded by a fragment of diflip.
  • a fragment of DITHP comprises a region of unique amino acid sequence that specificaUy identifies DITHP.
  • a fragment of DITHP is useful as an immunogenic peptide for the development of antibodies that specificaUy 5 recognize DITHP.
  • the precise length of a fragment of DITHP and the region of DITHP to which the fragment corresponds are routinely determinable by one of ordinary skiU in the art based on the intended purpose for the fragment.
  • a "fuU length" nucleotide sequence is one containing at least a start site for translation to a protein sequence, foUowed by an open reading frame and a stop site, and encoding a "full length” 0 polypeptide.
  • “Hit” refers to a sequence whose annotation wiUbe used to describe a given template. Criteria for selecting the top hit are as foUows: if the template has one or more exact nucleic acid matches, the top hit is the exact match with highest percent identity. If the template has no exact matches but has significant protein hits, the top hit is the protein hit with the lowest E- value. H the template has no significant protein hits, but does have significant non-exact nucleotide hits, the top hit is the nucleotide hit with the lowest E- value.
  • Homology refers to sequence similarity either between a reference nucleic acid sequence and at least a fragment of a dithp or between a reference amino acid sequence and a fragment of a 5 DITHP.
  • Hybridization refers to the process by which a strand of nucleotides anneals with a complementary strand through base pairing. Specific hybridization is an indication that two nucleic acid sequences share a high degree of identity. Specific hybridization complexes form under defined annealing conditions, and remain hybridized after the "washing" step.
  • the defined hybridization 0 conditions include the annealing conditions and the washing step(s), the latter of which is particularly important in determining the stringency of the hybridization process, with more stringent conditions aHowing less non-specific binding, i.e., binding between pairs of nucleic acid probes that are not perfectly matched. Permissive conditions for anneaHng of nucleic acid sequences are routinely determinable and may be consistent among hybridization experiments, whereas wash conditions may 5 be varied among experiments to achieve the desired stringency.
  • GeneraUy stringency of hybridization is expressed with reference to the temperature under which the wash step is carried out.
  • GeneraUy such wash temperatures are selected to be about 5°C to 20°C lower than the thermal melting point (T j for the specific sequence at a defined ionic strength and pH.
  • T j thermal melting point
  • the T m is the temperature (under defined ionic strength and pH) at which 50% of the target o sequence hybridizes to a perfectly matched probe.
  • High stringency conditions for hybridization between polynucleotides of the present invention 5 include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0.1% SDS, for 1 hour. Alternatively, temperatures of about 65 °C, 60°C, or 55°C may be used. SSC concentration may be varied from about 0.2 to 2 x SSC, with SDS being present at about 0.1%.
  • TypicaHy blocking reagents are used to block non-specific hybridization. Such blocking reagents include, for instance, denatured salmon sperm DNA at about 100-200 ⁇ g/ml. Useful variations on these conditions will be readily o apparent to those skiUed in the art. Hybridization, particularly under high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. Such similarity is strongly indicative of a similar role for the nucleotides and their resultant proteins.
  • RNA:DNA hybridizations may also be used under particular circumstances, such as RNA:DNA hybridizations. Appropriate hybridization conditions are routinely determinable by one of ordinary skiU in the art.
  • ImmunologicaUy active or “immunogenic'' describes the potential for a natural, recombinant, or synthetic peptide, epitope, polypeptide, or protein to induce antibody production in appropriate animals, ceUs, or ceU Hues.
  • Immunogenic response can refer to conditions associated with inflammation, trauma, immune disorders, or infectious or genetic disease, etc. These conditions can be characterized by expression of various factors, e.g., cytokines, chemokines, and other signaling molecules, which may affect ceUular and systemic defense systems.
  • An "immunogenic fragment” is a polypeptide or oHgopeptide fragment of Dithp which is capable of eliciting an immune response when introduced into a Hving organism, for example, a mammal.
  • the term “immunogenic fragment” also includes any polypeptide or oHgopeptide fragment of DITHP which is useful in any of the antibody production methods disclosed herein or known in the art.
  • “Insertion” or “addition” refers to a change in either a nucleic or amino acid sequence in wliich at least one nucleotide or residue, respectively, is added to the sequence.
  • Labeling refers to the covalent or nonco valent joining of a polynucleotide, polypeptide, or antibody with a reporter molecule capable of producing a detectable or measurable signal.
  • Microanay is any anangement of nucleic acids, amino acids, antibodies, etc., on a substrate.
  • the substrate may be a soHd support such as beads, glass, paper, nitroceUulose, nylon, or an appropriate membrane.
  • Linkers are short stretches of nucleotide sequence which may be added to a vector or a dithp to create restriction endonuclease sites to faciHtate cloning.
  • Polylinkers are engineered to incorporate multiple restriction enzyme sites and to provide for the use of enzymes which leave 5' or 3' overhangs (e.g., BamHI, EcoRI, and HindlH) and those which provide blunt ends (e.g., EcoRV, SnaBI, and Stul).
  • NeataraUy occurring refers to an endogenous polynucleotide or polypeptide that maybe isolated from viruses or prokaryotic or eukaryotic ceUs.
  • Nucleic acid sequence refers to the specific order of nucleotides joined by phosphodiester bonds in a linear, polymeric anangement. Depending on the number of nucleotides, the nucleic acid sequence can be considered an oHgomer, oHgonucleotide, or polynucleotide.
  • the nucleic acid can be DNA, RNA, or any nucleic acid analog, such as PNA, maybe of genomic or synthetic origin, maybe either double-stranded or single-stranded, and can represent either the sense or antisense (complementary) strand.
  • operably linked refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence.
  • a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence.
  • GeneraUy operably linked DNA sequences may be in close proximity or contiguous and, where necessary to join two protein coding regions, in the same reading frame.
  • PNA protein nucleic acid
  • percent identity and % identity refer to the percentage of residue matches between at least two polynucleotide sequences aHgned using a 5 standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize aHgnment between two sequences, and therefore achieve a more meaningful comparison of the two sequences.
  • NCBI National Center for Biotechnology Information
  • BLAST Basic Local AHgnment Search Tool
  • NCBI National Center for Biotechnology Information
  • BLAST Basic Local AHgnment Search Tool
  • the BLAST software suite includes various sequence analysis programs including "blastn,” that is used to determine aHgnment between a known polynucleotide sequence and other sequences on a variety of databases.
  • BLAST 2 Sequences are used for direct pairwise comparison of two nucleotide sequences.
  • "BLAST 2 5 Sequences” can be accessed and used interactively at http://www.ncbi.nlm.nih.gov/gorf/bl2/. The equences too can e use or ot astn an astp scusse e ow . programs are commonly used with gap and other parameters set to default settings. For example, to compare two nucleotide sequences, one may use blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) set at default parameters. Such default parameters maybe, for example: 5 Matrix: BLOSUM62
  • Percent identity may be measured over the length of an entire defined sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, 5 over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides.
  • Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured.
  • Nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that aU encode substantiaUy the same protein.
  • Percent identity and “% identity”, as appHed to polypeptide sequences refer to 5 the percentage of residue matches between at least two polypeptide sequences aHgned using a standardized algorithm. Methods of polypeptide sequence aHgnment are weU-known. Some aHgnment methods take into account conservative amino acid substitations. Such conservative substitations, explained in more detail above, generaUy preserve the hydrophobicity and acidity of the substituted residue, thus preserving the structure (and therefore function) of the folded polypeptide. o Percent identity between polypeptide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e sequence aHgnment program (described and referenced above).
  • the PAM250 matrix is selected as the default resi ue weig a e. s wi po ynuc eo e a gnmen s, e percen i en y s repor e y
  • NCBI BLAST software suite may be used.
  • BLAST 2 Sequences Version 2.0.9 (May-07-1999) with blastp set at default parameters.
  • Such default parameters may be, for example:
  • Percent identity may be measured over the length of an entire defined polypeptide sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues.
  • Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, maybe used to describe a length over which percentage identity may be measured.
  • Post-translational modification of a DITHP may involve Hpidation, glycosylation, phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in the art. These processes may occur syntheticaUy or biochemicaUy. Biochemical modifications wiU vary by ceU type depending on the enzymatic miHeu and the DITHP.
  • Probe refers to dithp or fragments thereof, which are used to detect identical, aUeHc or related nucleic acid sequences.
  • Probes are isolated oHgonucleotides or polynucleotides attached to a detectable label or reporter molecule. Typical labels include radioactive isotopes, Hgands, chemiluminescent agents, and enzymes.
  • Primary are short nucleic acids, usuaUy DNA oHgonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. The primer may then be extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for ampHfication (and identification) of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • Probes and primers as used in the present invention typicaUy comprise at least 15 contiguous nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also be employed, such as probes and primers that comprise at least 20, 30, 40, 50, 60, 70, 80, 90, 100, or at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers maybe considerably longer than these examples, an it is un réelleoo t at any eng supporte y e specification, including the figures and Sequence Listing, may be used.
  • PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA).
  • OHgonucleotides for use as primers are selected using software known in the art for such purpose.
  • OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 100 nucleotides each, and for the analysis of oHgonucleotides and larger polynucleotides of up to 5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases.
  • Similar primer selection programs have incorporated additional features for expanded capabiHties.
  • the PrimOU primer selection program (available to the pubHc from the Genome Center at University of Texas South West Medical Center, DaUas TX) is capable of choosing specific primers from megabase sequences and is thus useful for designing primers on a genome- wide scope.
  • Primer3 primer selection program (available to the pubHc from the Whitehead Institute/MIT Center for Genome Research, Cambridge MA) aUows the user to input a "mispriming Hbrary," in which sequences to avoid as primer binding sites are user-specified. Primer3 is useful, in particular, for the selection of oHgonucleotides for microarrays.
  • the source code for the latter two primer selection programs may also be obtained from their respective sources and modified to meet the user's specific needs.
  • the PrimeGen program (available to the pubHc from the UK Human Genome Mapping Project Resource Centre, Cambridge UK) designs primers based on multiple sequence aHgnments, thereby aUowing selection of primers that hybridize to either the most conserved or least conserved regions of aHgned nucleic acid sequences. Hence, this program is useful for identification of both unique and conserved oHgonucleotides and polynucleotide fragments.
  • oHgonucleotides and polynucleotide fragments identified by any of the above selection methods are useful in hybridization technologies, for example, as PCR or sequencing primers, microanay elements, or specific probes to identify fuUy or partiaUy complementary polynucleotides in a sample of nucleic acids. Methods of oHgonucleotide selection are not limited to those described above.
  • “Purified” refers to molecules, either polynucleotides or polypeptides that are isolated or separated from their natural environment and are at least 60% free, preferably at least 75% free, and most preferably at least 90% free from other compounds with which they are nataraUy associated.
  • a "recombinant nucleic acid” is a sequence that is not nataraUy occurring or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence.
  • recombinant includes nucleic acids that have been altered solely by addition, substitution, or deletion of a portion of the nucleic acid.
  • a recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter sequence.
  • Such a recombinant nucleic acid may be part of a vector that is used, for example, to transform a ceH.
  • such recombinant nucleic acids maybe part of a viral vector, e.g., based on a vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is expressed, inducing a protective immunological response in the mammal.
  • Regulatory element refers to a nucleic acid sequence from nontranslated regions of a gene, and includes enhancers, promoters, introns, and 3 ' untranslated regions, which interact with host proteins to cany out or regulate transcription or translation.
  • Reporter molecules are chemical or biochemical moieties used for labeling a nucleic acid, an amino acid, or an antibody. They include radionucHdes; enzymes; fluorescent, chemiluminescent, or chromogenic agents; substrates; cofactors; inhibitors; magnetic particles; and other moieties known in the art.
  • RNA equivalent in reference to a DNA sequence, is composed of the same linear sequence of nucleotides as the reference DNA sequence with the exception that aU occunences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose.
  • Samples may contain nucleic or amino acids, antibodies, or other materials, and maybe derived from any source (e.g., bodily fluids including, but not Hmited to, saHva, blood, and urine; chromosome(s), organeUes, or membranes isolated from a ceH; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared ceUs or tissues or blots or imprints from such ceUs or tissues).
  • source e.g., bodily fluids including, but not Hmited to, saHva, blood, and urine; chromosome(s), organeUes, or membranes isolated from a ceH; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared ceUs or tissues or blots or imprints from such ceUs or tissues).
  • Specific binding or “specificaUy binding” refers to the interaction between a protein or peptide and its agonist, antibody, antagonist, or other binding partner. The interaction is dependent upon the presence of a particular structure of the protein, e.g., the antigenic determinant or epitope, recognized by the binding molecule. For example, if an antibody is specific for epitope "A,” the presence of a polypeptide containing epitope A, or the presence of free unlabeled A, in a reaction containing free labeled A and the antibody wiU reduce the amount of labeled A that binds to the antibody. “Substitution” refers to the replacement of at least one nucleotide or amino acid by a different nucleotide or amino acid.
  • Substrate refers to any suitable rigid or semi-rigid support including, e.g., membranes, filters, chips, sHdes, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, 5 microparticles or capiHaries.
  • the substrate can have a variety of surface forms, such as weHs, trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound.
  • a “transcript image” or “expression profile” refers to the coUective pattern of gene expression by a particular ceU type or tissue under given conditions at a given time.
  • Transformation refers to a process by which exogenous DNA enters a recipient ceU.
  • 0 Transformation may occur under natural or artificial conditions using various methods weU known in the art. Transformation may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host ceU. The method is selected based on the host ceH being transformed.
  • Transformants include stably transformed ceHs in which the inserted DNA is capable of 5 repHcation either as an autonomously repHcating plasmid or as part of the host chromosome, as weU as ceUs which transiently express inserted DNA or RNA.
  • a "transgenic organism,” as used herein, is any organism, including but not Hmited to animals and plants, in which one or more of the ceUs of the organism contains heterologous nucleic acid introduced by way of human intervention, such as by transgenic techniques weU known in die art.
  • the o nucleic acid is introduced into the ceU, directly or indirectly by introduction into a precursor of the ceU, by way of deHberate genetic manipulation, such as by microinjection or by infection with a recombinant virus.
  • the term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule.
  • the transgenic organisms contemplated in accordance with the present invention include bacteria, cyanobacteria, 5 fungi, and plants and animals.
  • the isolated DNA of the present invention can be introduced into the host by methods known in the art, for example infection, transfection, transformation or transconjugation. Techniques for transferring the DNA of the present invention into such organisms are widely known and provided in references such as Sambrook et al. (1989), supra.
  • a "variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having 0 at least 25% sequence identity to tihe particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters.
  • Such a pair of nucleic acids may show, for example, at least 30%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater 5 sequence identity over a certain defined length.
  • the variant may result in "conservative" amino acid changes which do not affect structural and/or chemical properties.
  • a variant may be described as, for example, an “aUeHc” (as defined above), “spHce,” “species,” or “polymorphic” variant.
  • a spHce variant may have significant identity to a reference molecule, but wiU generaUy have a greater or lesser number of polynucleotides due to alternate spHcing of exons during mRNA processing.
  • the corresponding polypeptide may possess additional functional domains or lack domains that are present in the reference molecule.
  • Species variants are polynucleotide sequences that vary from one species to another.
  • a polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species. Polymorphic variants also may encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies by one base. The presence of SNPs maybe indicative of, for example, a certain population, a disease state, or a propensity for a disease state.
  • SNPs single nucleotide polymorphisms
  • variants of the polynucleotides of the present invention maybe generated through recombinant methods.
  • One possible method is a DNA shuffling technique such as MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent Number
  • DNA shuffling is a process by which a Hbrary of gene variants is produced using PCR-mediated recombination of gene fragments.
  • the Hbrary is then subjected to selection or screening procedures that identify those gene variants with the desired properties. These preferred variants may then be pooled and further subjected to recursive rounds of DNA shuffling and selection/screening.
  • genetic diversity is created through "artificial" breeding and rapid molecular evolution. For example, fragments of a single gene containing random point mutations may be recombined, screened, and then reshuffled until the desired properties are optimized. Alternatively, fragments of a given gene maybe recombined with fragments of homologous genes in the same gene family, either from the same or different species, thereby maximizing the genetic diversity of multiple nataraUy occurring genes in a directed and controUable manner.
  • a "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters.
  • Such a pair of polypeptides may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater identity over a certain defined length of one of the polypeptides.
  • cDNA sequences derived from human tissues and ceU Hnes were aHgned based on nucleotide sequence identity and assembled into "consensus" or "template” sequences which are designated by the template identification numbers (template IDs) in column 2 of Table 2.
  • the sequence identification numbers (SEQ ID NO:s) conespondmg to the template IDs are shown in column 1.
  • the template sequences have similarity to GenBank sequences, or "hits,” as designated by the GI Numbers in column 3.
  • the statistical probabiHty of each GenBank bit is indicated by a probabiHty score in column 4, and the functional annotation conesponding to each
  • GenBank hit is Hsted in column 5.
  • the invention incorporates the nucleic acid sequences of these templates as disclosed in the
  • sequences of the present invention are used to develop a transcript image for a particular ceU or tissue.
  • cDNA was isolated from Hbraries constructed using RNA derived from normal and diseased human tissues and ceH Hnes.
  • the human tissues and ceU Hnes used for cDNA Hbrary construction were selected from a broad range of sources to provide a diverse population of cDNAs representative of gene transcription throughout the human body. Descriptions of the human tissues and ceU Hnes used for cDNA Hbrary construction are provided in the LIFESEQ database (Incyte Genomics, Inc. (Incyte), Palo Alto CA).
  • Human tissues were broadly selected from, for example, cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, reproductive, and urologic sources.
  • CeU Hnes used for cDNA Hbrary construction were derived from, for example, leukemic ceUs, teratocarcinomas, neuroepitheHomas, cervical carcinoma, lung fibroblasts, and endotheHal ceUs.
  • ceU Hnes include, for example, THP-1, Jurkat, HUVEC, hNT2, WI38, HeLa, and other ceU Hnes commonly used and available from pubHc depositories (American Type Culture CoUection, Manassas VA).
  • ceU Hnes Prior to mRNA isolation, ceU Hnes were untreated, treated with a pharmaceutical agent such as 5'-aza-2'-deoxycy ⁇ dine, treated with an activating agent such as Hpopolysaccharide in the case of leukocytic ceH Hnes, or, in the case of endotheHal ceU Hnes, subjected to shear stress. equencing of t e c s
  • Chain termination reaction products may be electrophoresed on urea- polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or by fluorescence (for fluorophore-labeled nucleotides).
  • Automated methods for mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have been developed.
  • Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 Hquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, Inc.
  • Sequencing can be carried out using, for example, the ABI 373 or 377 (AppHed Biosystems) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA sequencing systems, or other automated and manual sequencing systems weU known in the art.
  • nucleotide sequences of the Sequence Listing have been prepared by cunent, state-of- the-art, automated methods and, as such, may contain occasional sequencing enors or unidentified nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified bases do not represent a hindrance to practicing the invention for those skiUed in the art.
  • Several methods employing standard recombinant techniques may be used to conect enors and complete the missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. (1997) Short Protocols in Molecular Biology, John Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Plainview NY.)
  • Human polynucleotide sequences maybe assembled using programs or algorithms weU known in the art. Sequences to be assembled are related, whoUy or in part, and may be derived from a single or many different transcripts. Assembly of the sequences can be performed using such programs as PHRAP (Phils Revised Assembly Program) and the GELVJEW fragment assembly system (GCG), or other methods known in the art. Alternatively, cDNA sequences are used as "component" sequences that are assembled into
  • “template” or “consensus” sequences as foUows. Sequence chromatograms are processed, verified, and quaHty scores are obtained using PHRED. Raw sequences are edited using an editing pathway known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA). A series of BLAST comparisons is performed and low-information segments and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n's", or masked, to prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed.
  • Block 1 See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA).
  • a series of BLAST comparisons is performed and low-information segments and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.)
  • the processed sequences are then loaded into a relational database management system (RDMS) which assigns edited sequences to existing templates, if available.
  • RDMS relational database management system
  • a process is initiated which modifies existing templates or creates new templates from works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences themselves.
  • the templates can be merged into bins. If multiple templates exist in one bin, the bin can be spHt and the templates reannotated.
  • bins are "clone joined" based upon clone information. Clone joining occurs when the 5' sequence of one clone is present in one bin and the 3' sequence from the same clone is present in a different bin, indicating that the two bins should be merged into a single bin. Only bins which share at least two different clones are merged.
  • a resultant template sequence may contain either a partial or a fuU length open reading frame, or aU or part of a genetic regulatory element. This variation is due in part to the fact that the fuU length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. With current technology, cDNAs comprising the coding regions of large genes cannot be cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete "second strand" synthesis. Template sequences maybe extended to include additional contiguous sequences derived from the parent RNA transcript using a variety of methods known to those of skiU in the art.
  • Extension may thus be used to achieve the full length coding sequence of a gene.
  • cDNA sequences are analyzed using a variety of programs and algorithms which are weU known in the art. (See, e.g., Ausubel, 1997, supra. Chapter 7.7; Meyers, RA. (Ed.) (1995)
  • BLAST is especiaUy useful in determining exact matches and comparing two sequence fragments of arbitrary but equal lengths, whose aHgnment is locaUy maximal and for which the aHgnment score meets or exceeds a threshold or cutoff score set by the user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845).
  • an appropriate search tool e.g., BLAST or HMM
  • GenBank, SwissProt, BLOCKS, PFAM and other databases may be searched for sequences containing regions of homology to a query dithp or DITHP of the present invention.
  • Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference.
  • SEQ ID NO:57 and SEQ ID NO:58, encoded by SEQ ID NO.l and SEQ ID NO:2, respectively, are, for example, human enzyme molecules.
  • SEQ ID NO:59, SEQ ID NO:60, and SEQ ID NO:61, encoded by SEQ JD NO:3, SEQ ID NO:4, and SEQ TD NO:5, respectively, are, for example, receptor molecules.
  • SEQ ID NO:62 and SEQ TD NO:63 encoded by SEQ ID NO:6 and SEQ JD NO:7, respectively, are, for example, intraceUular signaling molecules.
  • SEQ ID NO:89 and SEQ JD NO:90 encoded by SEQ ID NO:33 and SEQ ID NO:34, respectively, are, for example, membrane transport molecules.
  • SEQ TD NO:35, SEQ TD NO:36, SEQ TD NO:37, and SEQ ID NO:38, respectively, are, for example, protein modification and maintenance molecules.
  • SEQ ID NO:95 encoded by SEQ ID NO:39 is, for example, an adhesion molecule.
  • SEQ ID NO:96 and SEQ ID NO:97 encoded by SEQ TD NO:40 and SEQ TD NO:41, respectively, are, for example, antigen recognition molecules.
  • SEQ ID NO:98 encoded by SEQ ID NO:42 is, for example, an electron transfer associated molecule.
  • SEQ ID NO:99 and SEQ ID NO OO encoded by SEQ TD NO:43 and SEQ ID NO:44, respectively, are, for example, cytoskeletal molecules.
  • SEQ ID NO:103, SEQ TD NO:104, SEQ TD NO:105, SEQ ED NO:106, and SEQ ID NO:107, encoded by SEQ ID NO:47, SEQ TD NO:48, SEQ ED NO:49, SEQ ID NO:50, and SEQ ED NO:50, respectively, are, for example, organeUe associated molecules.
  • SEQ ID NO:108 and SEQ ID NO:109, encoded by SEQ ID NO:51 and SEQ ID NO:52, respectively, are, for example, biochemical pathway molecules.
  • SEQ ID NO:110, SEQ ID NO:lll, SEQ ID NO:112, and SEQ ID NO:113, encoded by SEQ ID NO:53, SEQ ID NO:54, SEQ TD NO:55, and SEQ ID NO:56, respectively, are, for example, molecules associated with growth and development.
  • the dithp of the present invention may be used for a variety of diagnostic and therapeutic purposes.
  • a dithp may be used to diagnose a particular condition, disease, or disorder associated with human molecules.
  • Such conditions, diseases, and disorders include, but are not limited to, a cell proliterative disorder, such as actinic Keratosis, arte ⁇ osclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gaU bladder,
  • the dithp can be used to detect the presence of, or to quantify the amount of, a dithp-related polynucleotide in a sample. This information is then compared to information obtained from appropriate reference samples, and a diagnosis is estabHshed.
  • a polynucleotide complementary to a given dithp can inhibit or inactivate a therapeuticaUy relevant gene related to the dithp.
  • the expression of dithp may be routinely assessed by hybridization-based methods to determine, for example, the tissue-specificity, disease-specificity, or developmental stage-specificity of dithp expression.
  • the level of expression of dithp may be compared among different ceU types or tissues, among diseased and normal ceU types or tissues, among ceH types or tissues at different developmental stages, or among ceU types or tissues undergoing various treatments.
  • This type of analysis is useful, for example, to assess the relative levels of ditiip expression in fuUy or partiaUy differentiated ceUs or tissues, to determine if changes in dithp expression levels are conelated with the development or progression of specific disease states, and to assess the response of a ceU or tissue to a specific therapy, for example, in pharmacological or toxicological stadies.
  • Methods for die analysis of dithp expression are based on hybridization and ampHfication technologies and include membrane-based procedures such as northern blot analysis, high-throughput procedures that utiHze, for example, microarrays, and PCR-based procedures.
  • the dithp, their fragments, or complementary sequences maybe used to identify the presence of and/or to determine the degree of similarity between two (or more) nucleic acid sequences.
  • the dithp maybe hybridized to nataraUy occuning or recombinant nucleic acid sequences under appropriately selected temperatures and salt concentrations. Hybridization with a probe based on the nucleic acid sequence of at least one of the dithp aUows for the detection of nucleic acid sequences, including genomic sequences, which are identical or related to the dithp of the Sequence Listing.
  • Probes may be selected from non-conserved or unique regions of at least one of the polynucleotides of SEQ ID NO:l-56 and tested for their abiHty to identify or ampHfy the target nucleic acid sequence using standard protocols. Polynucleotide sequences that are capable of hybridizing, in particular, to those shown in SEQ
  • ID NO:l-56 and fragments tiiereof can be identified using various conditions of stringency.
  • stringency See, e.g., Wahl, G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) Methods Enzymol. 152:507-511.
  • Hybridization conditions are discussed in 'Definitions.”
  • a probe for use in Southern or northern hybridization may be derived from a fragment of a dithp sequence, or its complement, that is up to several hundred nucleotides in length and is either single-stranded or double-stranded. Such probes may be hybridized in solution to biological materials such as plasmids, bacterial, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to artificial substrates containing dithp. Microarrays are particularly suitable for identifying the presence of and detecting the level of expression for multiple genes of interest by examining gene expression 5 correlated with, e.g., various stages of development, treatment with a drug or compound, or disease progression.
  • An array analogous to a dot or slot blot may be used to anange and link polynucleotides to the surface of a substrate using one or more of the foUowing: mechanical (vacuum), chemical, thermal, or UV bonding procedures.
  • Such an anay may contain any number of dithp and may be produced by hand or by using available devices, materials, and machines.
  • Microarrays may be prepared, used, and analyzed using methods known in the art. (See, e.g.,
  • 5 Probes may be labeled by either PCR or enzymatic techniques using a variety of commerciaUy available reporter molecules.
  • commercial kits are available for radioactive and chemiluminescent labeling (Amersham Pharmacia Biotech) and for alkaline phosphatase labeling (Life Technologies).
  • dithp maybe cloned into commerciaUy available vectors for the production of RNA probes.
  • Such probes may be transcribed in the presence of at least one labeled o nucleotide (e.g. , 32 P-ATP, Amersham Pharmacia Biotech).
  • AdditionaUy the polynucleotides of SEQ ID NO:l-56 or suitable fragments thereof can be used to isolate fuU length cDNA sequences utilizing hybridization and/or ampHfication procedures weU known in the art, e.g., cDNA Hbrary screening, PCR ampHfication, etc.
  • the molecular cloning of such full length cDNA sequences may employ the method of cDNA Hbrary screening with probes using the 5 hybridization, stringency, washing, and probing strategies described above and in Ausubel, supra. Chapters 3, 5, and 6. These procedures may also be employed with genomic Hbraries to isolate genomic sequences of dithp in order to analyze, e.g., regulatory elements. Genetic Mapping
  • Gene identification and mapping are important in the investigation and treatment of almost aU conditions, diseases, and disorders. Cancer, cardiovascular disease, Alzheimer's disease, arthritis, o diabetes, and mental illnesses are of particular interest. Each of these conditions is more complex than the single gene defects of sickle ceU anemia or cystic fibrosis, with select groups of genes being predictive of predisposition for a particular condition, disease, or disorder.
  • cardiovascular disease may result from malfunctioning receptor molecules that fail to clear cholesterol from the bloodstream
  • diabetes may result when a particular individual's immune system is 5 activated by an infection and attacks the insuHn-producing ceUs of the pancreas.
  • Alzheimer's disease has been Jinked to a gene on chromosome 21; other stadies predict a different gene and location.
  • Mapping of disease genes is a complex and reiterative process and generaUy proceeds from genetic linkage analysis to physical mapping.
  • a genetic linkage map traces parts of chromosomes that are inherited in the same pattern as the condition.
  • Statistics link the inheritance of particular conditions to particular regions of chromosomes, as defined by RFLP or other markers.
  • RFLP radio frequency polypeptide
  • OccasionaUy genetic markers and their locations are known from previous stadies. More often, however, the markers are simply stretches of DNA that differ among individuals. Examples of genetic linkage maps can be found in various scientific journals or at the Online MendeHan Inheritance in Man (OMIM) World Wide Web site.
  • dithp sequences may be used to generate hybridization probes useful in chromosomal mapping of nataraUy occurring genomic sequences.
  • Either coding or noncoding sequences of dithp may be used, and in some instances, noncoding sequences maybe preferable over coding sequences.
  • conservation of a dithp coding sequence among members of a multi-gene family may potentiaUy cause undesired cross hybridization during chromosomal mapping.
  • sequences may be mapped to a particular chromosome, to a specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), bacterial PI constructions, or single chromosome cDNA Hbraries.
  • HACs human artificial chromosomes
  • YACs yeast artificial chromosomes
  • BACs bacterial artificial chromosomes
  • PI constructions or single chromosome cDNA Hbraries.
  • Fluorescent in situ hybridization may be conelated with other physical chromosome mapping techniques and genetic map data. (See, e.g., Meyers, supra, pp. 965-968.) Correlation between the location of dithp on a physical chromosomal map and a specific disorder, or a predisposition to a specific disorder, may help define the region of DNA associated with that disorder.
  • the dithp sequences may also be used to detect polymorphisms that are geneticaUy linked to the inheritance of a particular condition, disease, or disorder.
  • In situ hybridization of chromosomal preparations and genetic mapping techniques may be used for extending existing genetic maps. Often the placement of a gene on the chromosome of another mammaHan species, such as mouse, may reveal associated markers even if the number or arm of the conesponding human chromosome is not known. These new marker sequences can be mapped to human chromosomes and may provide valuable information to investigators searching for disease genes using positional cloning or other gene discovery techniques.
  • any sequences mapping to that area may represent associated or regulatory genes for further investigation.
  • the nucleotide sequences of the subject invention may also be used to detect differences in chromosomal architecture due to translocation, 5 inversion, etc., among normal, carrier, or affected individuals.
  • a disease-associated gene is mapped to a chromosomal region, the gene must be cloned in order to identify mutations or other alterations (e.g., translocations or inversions) that may be correlated with disease.
  • This process requires a physical map of the chromosomal region containing the disease-gene of interest along with associated markers. A physical map is necessary for o determining the nucleotide sequence of and order of marker genes on a particular chromosomal region. Physical mapping techniques are weU known in the art and require the generation of overlapping sets of cloned DNA fragments from a particular organeUe, chromosome, or genome. These clones are analyzed to reconstruct and catalog their order. Once the position of a marker is determined, the DNA from that region is obtained by consulting the catalog and selecting clones from 5 that region. The gene of interest is located through positional cloning techniques using hybridization or similar methods.
  • the dithp of the present invention may be used to design probes useful in diagnostic assays.
  • Such assays weU known to those skiUed in the art, may be used to detect or confirm conditions, disorders, or diseases associated with abnormal levels of dithp expression.
  • Labeled probes developed from dithp sequences are added to a sample under hybridizing conditions of desired stringency.
  • dithp, or fragments or oHgonucleotides derived from dithp maybe used as primers in ampHfication steps prior to hybridization.
  • the amount of hybridization complex formed is quantified 5 and compared with standards for that ceU or tissue. H dithp expression varies significantly from the standard, the assay indicates the presence of the condition, disorder, or disease.
  • QuaHtative or quantitative diagnostic methods may include northern, dot blot, or other membrane or dip-stick based technologies or multiple-sample format technologies such as PCR, enzyme-linked immunosorbent assay (ELISA)-like, pin, or chip-based assays.
  • the probes described above may also be used to monitor the progress of conditions, disorders, or diseases associated with abnormal levels of dithp expression, or to evaluate the efficacy of a particular therapeutic treatment.
  • the candidate probe maybe identified from the dithp that are specific to a given human tissue and have not been observed in GenBank or other genome databases. Such a probe may be used in animal stadies, precHnical tests, clinical trials, or in monitoring the 5 treatment of an individual patient.
  • standard expression is estabHshed by methods weU known in the art for use as a basis of comparison, samples from patients affected by the disorder or disease are combined with the probe to evaluate any deviation from the standard profile, and a therapeutic agent is administered and effects are monitored to generate a treatment profile. Efficacy is evaluated by determining whether the expression progresses toward or returns to the standard normal pattern. Treatment profiles may be generated over a period of several days or several months. Statistical methods weU known to those skiUed in the art may be use to determine the significance of such therapeutic agents.
  • the polynucleotides are also useful for identifying individuals from minute biological samples, for example, by matching the RFLP pattern of a sample's DNA to that of an individual's DNA.
  • the polynucleotides of the present invention can also be used to determine the actual base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be used to prepare PCR primers for ampHfying and isolating such selected DNA, which can then be sequenced. Using this technique, an individual can be identified through a unique set of DNA sequences. Once a unique ED database is estabHshed for an individual, positive identification of that individual can be made from extremely smaU tissue samples.
  • oHgonucleotide primers derived from the dithp of the invention may be used to detect single nucleotide polymorphisms (SNPs). SNPs are substitations, insertions and deletions that are a frequent cause of inherited or acquired genetic disease in humans. Methods of SNP detection include, but are not Hmited to, single-stranded conformation polymorphism (SSCP) and fluorescent SSCP (fSSCP) methods.
  • SSCP single-stranded conformation polymorphism
  • fSSCP fluorescent SSCP
  • oHgonucleotide primers derived from dithp are used to ampHfy DNA using the polymerase chain reaction (PCR).
  • the DNA may be derived, for example, from diseased or normal tissue, biopsy samples, bodily fluids, and the like.
  • SNPs in the DNA cause differences in the secondary and tertiary structures of PCR products in single-stranded form, and these differences are detectable using gel electrophoresis in non-denaturing gels.
  • the oHgonucleotide primers are fluorescently labeled, which aUows detection of the amplimers in high- throughput equipment such as DNA sequencing machines.
  • AdditionaUy sequence database analysis methods, termed in siHco SNP (isSNP), are capable of identifying polymorphisms by comparing the sequences of individual overlapping DNA fragments which assemble into a common consensus sequence.
  • DNA-based identification techniques are critical in forensic technology.
  • DNA sequences taken from very smaU biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, saHva, semen, etc., can be ampHfied using, e.g., PCR, to identify individuals.
  • PCR e.g., PCR
  • reagents capable of identifying the source of a particular tissue.
  • Appropriate reagents can comprise, for example, DNA probes or primers prepared from the 5 sequences of the present invention that are specific for particular tissues. Panels of such reagents can identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to screen tissue cultures for contamination.
  • polynucleotides of the present invention can also be used as molecular weight markers on nucleic acid gels or Southern blots, as diagnostic probes for the presence of a specific mRNA in a o particular ceU type, in the creation of subtracted cDNA Hbraries which aid in the discovery of novel polynucleotides, in selection and synthesis of oHgomers for attachment to an anay or other support, and as an antigen to eHcit an immune response.
  • the dithp of the invention or their mammaHan homologs may be "knocked out” in an animal model system using homologous recombination in embryonic stem (ES) ceHs.
  • ES embryonic stem
  • Such techniques are weU known in the art and are useful for the generation of animal models of human disease. (See, e.g., U.S. Patent Number 5,175,383 and U.S. Patent Number 5,767,337.)
  • mouse ES ceUs such as the mouse 129/SvJ ceH Hne, are derived from the early mouse embryo and grown in culture.
  • the ES ceUs are transformed with a vector containing the gene of interest disrupted by a marker gene, e.g., the neomycin phosphotiansferase gene (neo; Capecchi, M.R. (1989) Science 244:1288-1292).
  • the vector integrates into the corresponding region of the host genome by homologous recombination.
  • homologous recombination takes place using the Cre-loxP system to knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J.D. (1996) CHn. Invest. 97:1999- 5 2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330).
  • Transformed ES ceHs are identified and microinjected into mouse ceU blastocysts such as those from the C57BL/6 mouse strain.
  • the blastocysts are surgicaUy transferred to pseudopregnant dams, and the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains.
  • Transgenic animals thus generated may be tested with potential therapeutic or toxic agents.
  • the dithp of the invention may also be manipulated in vitro in ES ceUs derived from human blastocysts. Human ES ceUs have the potential to differentiate into at least eight separate ceU lineages including endoderm, mesoderm, and ectodermal ceU types.
  • ceU lineages differentiate into, for example, neural ceUs, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. (1998) Science 282:1145-1147).
  • the dithp of the invention can also be used to create "knockin" humanized animals (pigs) or ransgemc anima s mice or ra s o mo e uman isease. i noc ⁇ n ec nology, a region o i p is injected into animal ES ceUs, and the injected sequence integrates into the animal ceU genome. Transformed ceHs are injected into blastalae, and the blastalae are implanted as described above.
  • Transgenic progeny or inbred Hnes are studied and treated with potential pharmaceutical agents to obtain information on treatment of a human disease.
  • a mammal inbred to overexpress dithp resulting, e.g., in the secretion of DITHP in its milk, may also serve as a convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74).
  • DITHP encoded by polynucleotides of the present invention may be used to screen for molecules that bind to or are bound by the encoded polypeptides.
  • the binding of the polypeptide and the molecule may activate (agonist), increase, inhibit (antagonist), or decrease activity of the polypeptide or the bound molecule.
  • Examples of such molecules include antibodies, oHgonucleotides, proteins (e.g., receptors), or smaU molecules.
  • the molecule is closely related to the natural Hgand of the polypeptide, e.g., a Hgand or fragment thereof, a natural substrate, or a structural or functional mimetic.
  • the molecule can be closely related to the natural receptor to which the polypeptide binds, or to at least a fragment of the receptor, e.g., the active site. In either case, the molecule can be rationaHy designed using known techniques.
  • the screening for these molecules involves producing appropriate ceUs which express the polypeptide, either as a secreted protein or on the ceH membrane.
  • ceUs include ceUs from mammals, yeast, Drosophila, or E. coH.
  • CeHs expressing the polypeptide or ceU membrane fractions which contain the expressed polypeptide are then contacted with a test compound and binding, stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed.
  • An assay may simply test binding of a candidate compound to the polypeptide, wherein binding is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label. Alternatively, the assay may assess binding in the presence of a labeled competitor.
  • the assay can be ca ⁇ ied out using ceU-free preparations, polypeptide/molecule affixed to a soHd support, chemical Hbraries, or natural product mixtures.
  • the assay may also simply comprise the steps of mixing a candidate compound with a solution containing a polypeptide, measuring polypeptide/molecule activity or binding, and comparing the polypeptide/molecule activity or binding to a standard.
  • an ELISA assay using, e.g., a monoclonal or polyclonal antibody can measure polypeptide level in a sample.
  • the antibody can measure polypeptide level by either binding, directly or indirectly, to the polypeptide or by competing with the polypeptide for a substrate.
  • AU of the above assays can be used in a diagnostic or prognostic context.
  • the molecules discovered using these assays can be used to treat disease or to bring about a particular result in a patient (e.g., blood vessel growth) by activating or inhibiting the polypeptide/molecule.
  • the assays can discover agents which may inhibit or enhance the production of the polypeptide from 5 suitably manipulated ceUs or tissues.
  • a transcript image represents the global pattern of gene expression by a particular tissue or ceU 0 type. Global gene expression patterns are analyzed by quantifying the number of expressed genes and their relative abundance under given conditions and at a given time. (See Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, expressly incorporated by reference herein.)
  • a transcript image may be generated by hybridizing the polynucleotides of the present invention or their complements to the totaHty of transcripts or reverse transcripts of a 5 particular tissue or ceU type.
  • the hybridization takes place in high-throughput format, wherein the polynucleotides of the present invention or their complements comprise a subset of a pluraHty of elements on a microanay.
  • the resultant tianscript image would provide a profile of gene activity pertaining to human molecules for diagnostics and therapeutics.
  • Transcript images which profile dithp expression may be generated using transcripts isolated o from tissues, ceU lines, biopsies, or other biological samples.
  • the transcript image may thus reflect dithp expression in vivo, as in the case of a tissue or biopsy sample, or in vitro, as in the case of a ceU Hne.
  • Transcript images which profile dithp expression may also be used in conjunction with in vitro model systems and precHnical evaluation of pharmaceuticals, as weU as toxicological testing of 5 industrial and nataraHy-occurring environmental compounds.
  • AU compounds induce characteristic gene expression patterns, frequently termed molecular fingerprints or toxicant signatares, which are indicative of mechanisms of action and toxicity (Nuwaysir, E. F. et al. (1999) Mol. Carcinog. 24:153- 159; Steiner, S. and Anderson, N.L. (2000) Toxicol. Lett. 112-113:467-71, expressly incorporated by reference herein).
  • a test compound has a signature similar to that of a compound with known o toxicity, it is likely to share those toxic properties.
  • These fingerprints or signatares are most useful and refined when they contain expression information from a large number of genes and gene famiHes.
  • IdeaUy a genome- wide measurement of expression provides the highest quaHty signature. Even genes whose expression is not altered by any tested compounds are important as weU, as the levels of expression of these genes are used to normaHze the rest of the expression data. The normaHzation 5 procedure is useful for comparison of expression data after treatment with different compounds.
  • the toxicity of a test compound is assessed by treating a biological sample containing nucleic acids with the test compound.
  • Nucleic acids that are expressed in the treated biological sample are hybridized with one or more probes specific to the polynucleotides of the present l o invention, so that transcript levels corresponding to the polynucleotides of the present invention may be quantified.
  • the transcript levels in the treated biological sample are compared with levels in an untreated biological sample. Differences in the tianscript levels between the two samples are indicative of a toxic response caused by the test compound in the treated sample.
  • proteome refers to the global pattern of protein expression in a particular tissue or ceU type.
  • proteome expression patterns, or profiles are analyzed by quantifying the number of expressed proteins and their relative abundance under given conditions and at a given time. A profile of a ceU's proteome may thus be generated by
  • the separation is achieved using two-dimensional gel electrophoresis, in which proteins from a sample are separated by isoelectric focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulfate slab gel electrophoresis in the second dimension (Steiner and Anderson, supra).
  • the proteins are visuaHzed in the gel as discrete and uniquely positioned spots, typicaUy by
  • the optical density of each protein spot is generaUy proportional to the level of the protein in the sample.
  • the optical densities of equivalently positioned protein spots from different samples, for example, from biological samples either treated or untreated with a test compound or therapeutic agent, are compared to identify any changes in protein spot density related to the treatment.
  • 3 o spots are partiaUy sequenced using, for example, standard methods employing chemical or enzymatic cleavage foUowed by mass spectrometry.
  • the identity of the protein in a spot may be determined by comparing its partial sequence, preferably of at least 5 contiguous amino acid residues, to the polypeptide sequences of the present invention. In some cases, further sequence data may be obtained for definitive protein identification.
  • a proteomic profile may also be generated using antibodies specific for DITHP to quantify the levels of DITHP expression.
  • the antibodies are used as elements on a microarray, and protein expression levels are quantified by exposing the microa ⁇ ay to the sample and detecting the levels of protein bound to each anay element (Lueking, A. et al. (1999) Anal. Biochem. 270:103-11; Mendoze, L.G. et al. (1999) Biotechniques 27:778-88).
  • Detection maybe performed by a 5 variety of methods known in the art, for example, by reacting die proteins in the sample with a thiol- or amino-reactive fluorescent compound and detecting the amount of fluorescence bound at each array element.
  • Toxicant signatares at the proteome level are also useful for toxicological screening, and should be analyzed in paraUel with toxicant signatares at die transcript level.
  • There is a poor o correlation between transcript and protein abundances for some proteins in some tissues (Anderson, N.L. and Seilhamer, J. (1997) Electrophoresis 18:533-537), so proteome toxicant signatares maybe useful in the analysis of compounds which do not significantly affect the transcript image, but which alter the proteomic profile.
  • the analysis of transcripts in body fluids is difficult, due to rapid degradation of mRNA, so proteomic profiling may be more reHable and informative in such cases.
  • the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins that are expressed in the treated biological sample are separated so that the amount of each protein can be quantified. The amount of each protein is compared to the amount of the conesponding protein in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the 0 test compound in the treated sample. Individual proteins are identified by sequencing the amino acid residues of the individual proteins and comparing these partial sequences to the D ⁇ THP encoded by polynucleotides of the present invention.
  • the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins from the biological sample are incubated 5 with antibodies specific to the DITHP encoded by polynucleotides of the present invention. The amount of protein recognized by the antibodies is quantified. The amount of protein in the treated biological sample is compared with the amount in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample.
  • Transcript images may be used to profile dithp expression in distinct tissue types. This process can be used to determine human molecule activity in a particular tissue type relative to this activity in a different tissue type. Transcript images may be used to generate a profile of dithp expression characteristic of diseased tissue.
  • Transcript images of tissues before and after treatment may be used for diagnostic purposes, to monitor the progression of disease, and to monitor the efficacy 5 of drug treatments for diseases which affect the activity of human molecules.
  • Transcript images of ceH Hnes can be used to assess human molecule activity and/or to identify ceH Hnes that lack or misregulate this activity. Such ceU Hnes may then be treated with pharmaceutical agents, and a transcript image foUowing treatment may indicate the efficacy of these agents in restoring desired levels of this activity.
  • a similar approach may be used to assess the toxicity of pharmaceutical agents as reflected by undesirable changes in human molecule activity.
  • Candidate pharmaceutical agents may be evaluated by comparing their associated transcript images with those of pharmaceutical agents of known effectiveness.
  • Antisense Molecules The polynucleotides of the present invention are useful in antisense technology. Antisense technology or therapy reHes on the modulation of expression of a target protein through the specific binding of an antisense sequence to a target sequence encoding the target protein or directing its expression.
  • Antisense technology or therapy reHes on the modulation of expression of a target protein through the specific binding of an antisense sequence to a target sequence encoding the target protein or directing its expression.
  • Agrawal, S., ed. 1996 Antisense Therapeutics, Humana Press Inc., Totawa NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3):171-178; Crooke, S.T. (1997) Adv. Pharmacol. 40:1-49; Sharma, H.W. and R.
  • An antisense sequence is a polynucleotide sequence capable of specificaUy hybridizing to at least a portion of the target sequence. Antisense sequences bind to ceUular mRNA and/or genomic DNA, affecting translation and/or transcription. Antisense sequences can be DNA, RNA, or nucleic acid mimics and analogs. (See, e.g., Rossi, J J. et al. (1991) Antisense Res. Dev. l(3):285-288; Lee, R. et al.
  • antisense sequences can be produced ex vivo, such as by using any of the ABI nucleic acid synthesizer series (AppHed Biosystems) or other automated systems known in the art.
  • Antisense sequences can also be produced biologicaHy, such as by transforming an appropriate host ceU with an expression vector containing the sequence of interest. (See, e.g., Agrawal, supra.)
  • Antisense sequences can be deHvered intraceUularly in the form of an expression plasmid which, upon transcription, produces a sequence complementary to at least a portion of the ceUular sequence encoding the target protein.
  • Antisense sequences can also be introduced lntracellularly through the use of viral vectors, such as retrovirus and adeno-associated virus vectors.
  • viral vectors such as retrovirus and adeno-associated virus vectors.
  • viral vectors such as retrovirus and adeno-associated virus vectors.
  • Other gene deHvery 5 mechanisms include Hposome-derived systems, artificial viral envelopes, and other systems known in the art.
  • the nucleotide sequences encoding DITHP or fragments thereof may be inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for transcriptional and translational contiol of the inserted coding sequence in a suitable host.
  • Methods which are 1 weU known to those skiUed in the art may be used to construct expression vectors containing sequences encoding DITHP and appropriate transcriptional and 5 tianslational control elements. These methods include in vitio recombinant DNA techniques, synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, supra. Chapters 4, 8, 16, and 17; and Ausubel, supra. Chapters 9, 10, 13, and 16.)
  • a variety of expression vector/host systems may be utilized to contain and express sequences encoding DITHP. These include, but are not Hmited to, microorganisms such as bacteria transformed o with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect ceU systems infected with viral expression vectors (e.g., baculovirus); plant ceU systems transformed with viral expression vectors (e.g., cauliflower mosaic virus, CaMV, or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal (mammaHan) ceU systems.
  • microorganisms such as bacteria transformed o with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect ceU systems infected with viral expression vectors (e.g., baculovirus); plant ceU systems transformed with viral
  • Expression vectors derived from retroviruses, adenoviruses, or herpes or vaccinia viruses, or from various bacterial plasmids, may be used for deHvery of nucleotide sequences to the targeted organ, tissue, or ceU population.
  • Di Nicola 5 M. et al. (1998) Cancer Gen. Ther. 5(6):350-356; Yu, M. et al., (1993) Proc. Natl. Acad. Sci. USA 90(13):6340-6344; BuUer, R.M. et al. (1985) Nature 317(6040):813-815; McGregor, D.P. et al. (1994)
  • sequences encoding DITHP can be transformed into ceU Hnes using expression vectors which may contain viral origins of repHcation and/or endogenous expression elements and a selectable marker gene on the same or on a separate vector. Any number of selection systems maybe used to recover transformed ceU Hnes.
  • the dithp of the invention may be used for somatic or germline gene therapy.
  • Gene therapy maybe performed to (i) conect a genetic deficiency (e.g., in the cases of severe combined immunodeficiency (SCBD)-Xl disease characterized by X-Hhked inheritance (Cavazzana-Calvo, M. et al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; o Bordignon, C et al.
  • SCBD severe combined immunodeficiency
  • ADA adenosine deaminase
  • conditionaUy lethal gene product e.g., in the case of 5 cancers which result from unregulated ceU proHferation
  • a protein which affords protection against intraceUular parasites e.g., against human retroviruses, such as human immunodeficiency virus (HIV) (Baltimore, D. (1988) Nature 335:395-396; Poeschla, E. et al. (1996) Proc. Natl. Acad. Sci. USA.
  • hepatitis B or C virus HBV, HCV
  • fungal parasites such as Candida albicans and Paracoccidioides brasiHensis
  • protozoan parasites such as o Plasmodium falciparum and Trypanosoma cruzi.
  • the expression of dithp from an appropriate population of transduced ceUs may aUeviate the clinical manifestations caused by the genetic deficiency.
  • diseases or disorders caused by deficiencies in dithp are treated by constructing mammaHan expression vectors comprising dithp and introducing these 5 vectors by mechanical means into dithp-deficient ceUs.
  • Mechanical transfer technologies for use with ceHs in vivo or ex vitro include (i) direct DNA microinjection into individual ceUs, (H) ballistic gold particle deHvery, (iii) Hposome-mediated transfection, (iv) receptor-mediated gene transfer, and (v) the use of DNA transposons (Morgan, RA. and Anderson, W.F. (1993) Annu. Rev. Biochem. 62:191- 217; Ivies, Z. (1997) CeU 91:501-510; Boulay, J-L. and Recipon, H. (1998) Curr. Opin. Biotechnol. 9:445-450).
  • Expression vectors that may be effective for the expression of dithp include, but are not Hmited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP, PVAX vectors (Invitrogen, Carlsbad CA), PCMV-SC TPT, PCMV-TAG, PEGSH PERV (Stratagene, La JoUa CA), and PTET-OFF, PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG (Clontech, Palo Alto CA).
  • the dithp of the invention maybe expressed using (i) a constitatively active promoter, (e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SN40 virus, thymidine kinase (TK), or ⁇ -actin genes), (n) an inducible promoter (e.g., the tetracycline-regulated promoter (Gossen, M. and Bujard, H. (1992) Proc. ⁇ atl. Acad. Sci. U.S.A. 89:5547-5551; Gossen, M. et al., (1995) Science 268:1766-1769; Rossi, F.M.N and Blau, H.M.
  • a constitatively active promoter e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SN40 virus, thymidine kinase (TK), or ⁇ -actin
  • Invitrogen die FK506/rapamycin inducible promoter; or the RU486/mifepristone inducible promoter (Rossi, F.M.V. and Blau, H.M. supra), or (Hi) a tissue-specific promoter or the native promoter of the endogenous gene encoding DITHP from a normal individual.
  • Hposome transformation kits e.g., the PERFECT LEPED TRANSFECTION KIT, available from Invitrogen
  • aUow one with ordinary skiU in the art to deHver polynucleotides to target ceHs in culture and require minimal effort to optimize experimental parameters.
  • transformation is performed using the calcium phosphate method (Graham, F.L. and Eb, A J. (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J. 1:841-845).
  • the introduction of DNA to primary ceUs requires modification of these standardized mammaHan transfection protocols.
  • diseases or disorders caused by genetic defects with respect to dithp expression are treated by constructing a retrovirus vector consisting of (i) ditiip under the control of an independent promoter or the retrovirus long terminal repeat (LTR) promoter, (n) appropriate RNA packaging signals, and (Hi) a Rev-responsive element (RRE) along with additional retrovirus cw-acting RNA sequences and coding sequences required for efficient vector propagation.
  • a retrovirus vector consisting of (i) ditiip under the control of an independent promoter or the retrovirus long terminal repeat (LTR) promoter, (n) appropriate RNA packaging signals, and (Hi) a Rev-responsive element (RRE) along with additional retrovirus cw-acting RNA sequences and coding sequences required for efficient vector propagation.
  • LTR long terminal repeat
  • RRE Rev-responsive element
  • Rettovirus vectors e.g., PFB and PFBNEO
  • PFB and PFBNEO are commerciaUy available (Stratagene) and are based onpubHshed data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. U.S.A. 92:6733-6737), incorporated by reference herein.
  • the vector is propagated in an appropriate vector producing ceU Hne (VPCL) that expresses an envelope gene with a tropism for receptors on the target ceUs or a promiscuous envelope protein such as VSVg (Armentano, D. et al. (1987) J. Virol. 61:1647-1650; Bender, M.A. et al. (1987) J.
  • VPCL ceU Hne
  • Propagation of retrovirus vectors, transduction of a population of ceUs (e.g., CD4 + T-ceUs), and the return of transduced ceUs to a patient are procedures weU known to persons skiUed in the art of gene therapy and have been weU documented (Ranga, U. et al. (1997) J. Virol. 71:7020-7029; Bauer, G. et al. (1997) Blood 89:2259-2267; Bonyhadi, M.L. (1997) J. Virol. 71:4707-4716; Ranga, U. et al. (1998) Proc. Natl. Acad. Sci. U.S.A. 95:1201- 1206; Su, L. (1997) Blood 89:2283-2290).
  • an adenovirus-based gene therapy deHvery system is used to deHver dithp to ceUs which have one or more genetic abnormaHties with respect to the expression of dithp.
  • the construction and packaging of adenovirus-based vectors are weU known to those with ordinary skill in the art.
  • RepHcation defective adenovirus vectors have proven to be versatile for importing genes encoding immunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) Transplantation 27:263-268). PotentiaUy useful adenoviral vectors are described in U.S.
  • Patent Number 5,707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby incorporated by reference.
  • adenoviral vectors see also Antinozzi, P.A. et al. (1999) Annu. Rev. Nutr. 19:511-544 and Verma, LM. and Somia, N. (1997) Nature 18:389:239-242, both incorporated by reference herein.
  • a herpes-based, gene therapy deHvery system is used to deHver dithp to target ceUs which have one or more genetic abnormaHties with respect to the expression of dithp.
  • herpes simplex virus (HSV)-based vectors may be especiaUy valuable for introducing dithp to ceUs of the central nervous system, for which HSV has a tropism.
  • the construction and packaging of herpes-based vectors are weU known to those with ordinary skiU in the art.
  • a repHcation-competent herpes simplex virus (HSV) type 1-based vector has been used to deHver a reporter gene to the eyes of primates (Liu, X. et al. (1999) Exp. Eye Res.l69:385-395).
  • the construction of a HSV-1 virus vector has also been disclosed in detail in U.S.
  • Patent Number 5,804,413 to DeLuca (Herpes simplex virus strains for gene transfer"), which is hereby incorporated by reference.
  • U.S. Patent Number 5,804,413 teaches die use of recombinant HSV d92 which consists of a genome containing at least one exogenous gene to be transferred to a ceU under the control of the appropriate promoter for purposes including human gene therapy. Also taught by this patent are the construction and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22.
  • HSV vectors see also Goins, W. F. et al. 1999 J. Virol. 73:519-532 and Xu, H. et al., (1994) Dev. Biol.
  • an alphavirus (positive, single-stranded RNA virus) vector is used to deHver dithp to target ceUs.
  • SFV Semliki Forest Virus
  • SFV Semliki Forest Virus
  • alphavirus RNA repHcation a subgenomic RNA is generated that normaUy encodes the viral capsid proteins.
  • This subgenomic RNA repHcates to higher levels than the fuU-length genomic RNA, resulting in the overproduction of capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase).
  • enzymatic activity e.g., protease and polymerase.
  • alphavirus infection is typicaUy associated with ceH lysis within a few days
  • the abiHty to estabHsh a persistent infection in hamster normal kidney ceUs (BHK-21) with a variant of Sindbis virus (SIN) indicates that the lytic repHcation of alphaviruses can be altered to suit the needs of 5 the gene therapy appHcation (Dryga, S.A. et al. (1997) Virology 228:74-83).
  • the specific transduction of a subset of ceHs in a population may require the sorting of ceUs prior to transduction.
  • the methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA and RNA transfections, and performing alphavirus infections, are weU known to those with ordinary skiU in the 0 art.
  • Anti-DITHP antibodies may be used to analyze protein expression levels. Such antibodies include, but are not Hmited to, polyclonal, monoclonal, chimeric, single chain, and Fab fragments. For 5 descriptions of and protocols of antibody technologies, see, e.g., Pound J.D. (1998) Immunochemical
  • amino acid sequence encoded by the dithp of the Sequence Listing may be analyzed by appropriate software (e.g., LASERGENE NAVIGATOR software, DNASTAR) to determine regions of high immunogenicity.
  • appropriate software e.g., LASERGENE NAVIGATOR software, DNASTAR
  • the optimal sequences for immunization are selected from the C- o terminus, the N-terminus, and those * intervening, hydrophiUc regions of the polypeptide which are likely to be exposed to the external environment when the polypeptide is in its natural conformation.
  • Peptides used for antibody induction do not need to have biological activity; however, they must be antigenic.
  • Peptides used to induce specific antibodies may have an amino acid sequence consisting of 5 at least five amino acids, preferably at least 10 amino acids, and most preferably at least 15 amino acids.
  • a peptide which mimics an antigenic fragment of the natural polypeptide may be fused with another protein such as keyhole limpet hemocyanin (KLH; Sigma, St. Louis MO) for antibody production.
  • KLH keyhole limpet hemocyanin
  • a peptide encompassing an antigenic region may be expressed from a dithp, synthesized as described above, or purified from human ceUs. 5 Procedures weU known in the art may be used for the production of antibodies.
  • Various hosts including mice, goats, and rabbits, maybe immunized by injection with a peptide. Depending on the host species, various adjuvants maybe used to increase immunological response.
  • peptides about 15 residues in length maybe synthesized using an ABI 431 A peptide synthesizer (AppHed Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by o reaction with M-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 1995, supra).
  • Rabbits are immunized with the peptide-KLH complex in complete Freund's adjuvant.
  • the resulting antisera are tested for antipeptide activity by binding the peptide to plastic, blocking with 1% bovine serum albumin (BSA), reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti-rabbit IgG.
  • Antisera with antipeptide activity are tested for anti-DITHP activity using protocols weU known in the 5 art, including ELISA, radioimmunoassay (RIA), and immunoblotting.
  • isolated and purified peptide may be used to immunize mice (about 100 ⁇ g of peptide) or rabbits (about 1 mg of peptide). Subsequently, the peptide is radioiodinated and used to screen the immunized animals' B-lymphocytes for production of antipeptide antibodies. Positive ceUs are then used to produce hybridomas using standard techniques. About 20 mg of peptide is 0 sufficient for labeling and screening several thousand clones. Hybridomas of interest are detected by screening with radioiodinated peptide to identify those fusions producing peptide-specific monoclonal antibody.
  • weUs of a multi-weU plate are coated with affinity-purified, specific rabbit-anti-mouse (or suitable anti-species IgG) antibodies at 10 mg/ml.
  • the coated weUs are blocked with 1% BSA and washed and exposed to supernatants from 5 hybridomas. After incubation, the weUs are exposed to radiolabeled peptide at 1 mg/ml.
  • Clones producing antibodies bind a quantity of labeled peptide that is detectable above background. Such clones are expanded and subjected to 2 cycles of cloning. Cloned hybridomas are injected mto pristane-treated mice to produce ascites, and monoclonal antibody is purified from the ascitic fluid by affinity chromatography on protein A (Amersham Pharmacia Biotech). Several o procedures for the production of monoclonal antibodies, including in vitro production, are described in Pound (supra). Monoclonal antibodies with antipeptide activity are tested for anti-DiTHP activity using protocols weU known in the art, including ELISA, RIA, and immunoblotting.
  • Antibody fragments containing specific binding sites for an epitope may also be generated.
  • such fragments include, but are not limited to, the F(ab')2 fragments produced by pepsin 5 digestion of the antibody molecule, and the Fab fragments generated by reducing the disulfide bridges of the F ab')2 fragments.
  • ternat ve y, construct on o a express on ra es n i amentous bacteriophage aUows rapid and easy identification of monoclonal fragments with desired specificity Pieround, supra. Chaps. 45-47).
  • Antibodies generated against polypeptide encoded by dithp can be used to purify and characterize fuU-length DITHP protein and its activity, binding partners, etc.
  • Anti-DITHP antibodies maybe used in assays to quantify the amount of DITHP found in a particular human ceH. Such assays include methods utilizing the antibody and a label to detect expression level under normal or disease conditions.
  • the peptides and antibodies of the invention may be used with or without modification or labeled by joining them, either covalently or noncovalently, with a reporter molecule.
  • Protocols for detecting and measuring protein expression using either polyclonal or monoclonal antibodies are weU known in the art. Examples include ELISA, RIA, and fluorescent activated ceH sorting (FACS). Such immunoassays typicaUy involve the formation of complexes between the DITHP and its specific antibody and the measurement of such complexes. These and other assays are described in Pound (supra).
  • RNA was purchased from CLONTECH Laboratories, Inc. (Palo Alto CA) or isolated from various tissues. Some tissues were homogenized and lysed in guanidinium isothiocyanate, while others were homogenized and lysed in phenol or in a suitable mixture of denaturants, such as TRIZOL (Life Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The resulting lysates were centrifuged over CsCl cushions or extracted with chloroform. RNA was precipitated with either isopropanol or sodium acetate and ethanol, or by other routine methods.
  • poly(A-t-) RNA was isolated using oHgo d(T)-coupled paramagnetic particles (Promega Corporation (Promega), Madison WI), 5 OLIGOTEX latex particles (QIAGEN, Inc. (QIAGEN), Valencia CA), or an OLIGOTEX mRNA purification kit (QIAGEN).
  • Stratagene was provided with RNA and constructed the conesponding cDNA Hbraries. Otherwise, cDNA was synthesized and cDNA Hbraries were constructed with the 0 UNTZAP vector system (Stratagene Cloning Systems, Inc. (Stratagene), La JoUa CA) or
  • SUPERSCRIPT plasmid system (Life Technologies), using the recommended procedures or similar methods known in the art. (See, e.g., Ausubel, 1997, supra. Chapters 5.1 through 6.6.) Reverse transcription was initiated using oHgo d(T) or random primers. Synthetic oHgonucleotide adapters were Hgated to double stranded cDNA, and the cDNA was digested with the appropriate restriction 5 enzyme or enzymes. For most Hbraries, the cDNA was size-selected (300-1000 bp) using SEPHACRYL S1000, SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography (Amersham Pharmacia Biotech) or preparative agarose gel electrophoresis.
  • cDNAs were Hgated into compatible restriction enzyme sites of the polylinker of a suitable plasmid, e.g., PBLUESCRIPT plasmid (Stratagene), PSPORT1 plasmid (Life Technologies), PCDNA2.1 plasmid (mvitrogen, o Carlsbad CA), PBK-CMV plasmid (Stratagene), PCR2-TOPOTA plasmid (Invitrogen), PCMV-ICIS plasmid (Stratagene), pIGEN (Incyte Genomics, Palo Alto CA), pRARE (Incyte Genomics), or pINCY (Incyte Genomics), or derivatives thereof.
  • Recombinant plasmids were transformed into competent E. coH ceUs including XLl-Blue, XLl-BlueMRF, or SOLR from Stratagene or DH5 ⁇ , DH10B, or ElectioMAX DH10B from Life Technologies. 5
  • Plasmids were recovered from host ceUs by in vivo excision using the UNTZAP vector system (Stratagene) or by ceU lysis. Plasmids were purified using at least one of the foUowing: the Magic or WIZARD Minipreps DNA purification system (Promega); the AGTC Miniprep purification kit (Edge o BioSystems, Gaithersburg MD); and the QIAWELL 8, QIAWELL 8 Plus, and QIAWELL 8 Ultra plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit (QIAGEN). FoUowing precipitation, plasmids were resuspended in 0.1 ml of distiUed water and stored, with or without lyophiHzation, at 4°C
  • plasmid DNA was ampHfied from host ceU lysates using direct link PCR in a 5 high-throughput format.
  • Host ceU lysis and thermal cycling steps were carried out in a single reaction mixture. Samples were processed and stored in
  • cDNA sequencing reactions were processed using standard methods or high-throughput instrumentation such as the ABI CATALYST 800 thermal cycler (AppHed Biosystems) or the PTC- 200 thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins Scientific Corp., Sunnyvale CA) or the MICROLAB 2200 Hquid transfer system (Hamilton).
  • cDNA sequencing reactions were prepared using reagents provided by Amersham Pharmacia Biotech or suppHed in ABI sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (AppHed Biosystems).
  • Electrophoretic separation of cDNA sequencing reactions and detection of labeled polynucleotides were carried out using the MEGABACE 1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or 377 sequencing system (AppHed Biosystems) in conjunction with standard ABI protocols and base calling software; or other sequence analysis systems known in the art. Reading frames within the cDNA sequences were identified using standard methods (reviewed in Ausubel, 1997, supra, Chapter 7.7). Some of the cDNA sequences were selected for extension using the techniques disclosed in Example VEH.
  • sequences from chromatograms were subject to PHRED analysis and assigned a quaHty score.
  • the sequences having at least a required quaHty score were subject to various preprocessing editing pathways to eliminate, e.g., low quaHty 3' ends, vector and linker sequences, polyA tails, Alu repeats, mitochondrial and ribosomal sequences, bacterial contamination sequences, and sequences smaUer than 50 base pairs.
  • low-information sequences and repetitive elements e.g., dinucleotide repeats, Alu repeats, etc.
  • sequences were then subject to assembly procedures in which the sequences were assigned to gene bins (bins). Each sequence could only belong to one bin. Sequences in each gene bin were assembled to produce consensus sequences (templates). Subsequent new sequences were added to existing bms usmg BLASTn (v.1.4 WashU) and CROSSMATCH. Candidate pairs were identified as aU BLAST hits having a quaHty score greater than or equal to 150. AHgnments of at least 82% local identity were accepted into the bin. The component sequences from each bin were assembled using a version of PHRAP. Bins with several overlapping component sequences were assembled using DEEP PHRAP.
  • each assembled template was determined based on the number and orientation of its component sequences. Template sequences as disclosed in the sequence Hsting correspond to sense strand sequences (the "forward" reading frames), to the best determination. The complementary (antisense) strands are inherently disclosed herein.
  • the component sequences which were used to assemble each template consensus sequence are Hsted in Table 5, along with their positions along die template nucleotide sequences.
  • Bins were compared against each other and those having local similarity of at least 82% were combined and reassembled. Reassembled bins having templates of insufficient overlap (less than 95% local identity) were re-spHt. Assembled templates were also subject to analysis by o STTTCHER/EXON MAPPER algorithms which analyze the probabilities of the presence of spHce variants, alternatively spHced exons, spHce junctions, differential expression of alternative spHced genes across tissue types or disease states, etc. These resulting bins were subject to several rounds of the above assembly procedures.
  • bins were clone joined 5 based upon clone information. If the 5' sequence of one clone was present in one bin and the 3' sequence from the same clone was present in a different bin, it was likely that the two bins actaaUy belonged together in a single bin. The resulting combined bins underwent assembly procedures to regenerate the consensus sequences.
  • the template sequences were further analyzed by translating each template in aU three forward reading frames and searching each translation against the Pfam database of hidden Markov model-based protein famiHes and domains using the HMMER software package (available to the pubHc from Washington University School of Medicine, St. Louis MO). Regions of templates which, when translated, contain similarity to Pfam consensus sequences are reported in Table 3, along with descriptions of Pfam protein domains and famiHes. Only those Pfam hits with an E-value of ⁇ 1 10 3 are reported.
  • Template sequences were also translated in aU three forward reading frames, and each translation was searched against TMHMMER, a program that uses a hidden Markov model (HMM) to delineate transmembrane segments on protein sequences and determine orientation (Sonnhammer, E.L. et al. (1998) Proc. Sixtii Intl. Conf. On Intelligent Systems for Mol. Biol., Glasgow et al., eds., The Am. Assoc. for Artificial InteUigence (AAAI) Press, Menlo Park, CA, and MTT Press, Cambridge, MA, pp. 175-182.) Regions of templates which, when translated, contain similarity to signal peptide or transmembrane consensus sequences are reported in Table 4.
  • HMM hidden Markov model
  • HMMER analysis as reported in Tables 3 and 4 may support the results of BLAST analysis as reported in Table 2 or may suggest alternative or additional properties of template- encoded polypeptides not previously uncovered by BLAST or other analyses.
  • Template sequences are further analyzed using the bioinformatics tools Hsted in Table 8, or using sequence analysis software known in the art such as MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). Template sequences may be further queried against pubHc databases such as the GenBank rodent, mammaHan, vertebrate, prokaryote, and eukaryote databases.
  • a polypeptide of the invention may begin at any of the methionine residues within the fuU length translated polypeptide.
  • Polypeptide sequences were subsequently analyzed by querying against the GenBank protein database (GENPEPT, (GenBank version 126)). FuU length polynucleotide sequences are also analyzed usmg MACDNAS1S PRO software (Hitachi Software Engineering, South San Francisco
  • Polynucleotide and polypeptide sequence aHgnments are generated using default parameters specified by the CLUSTAL algorithm as incorporated into the MEGALIGN multisequence aHgnment program (DNASTAR), which also calculates the percent identity between aHgned sequences.
  • Table 7 shows sequences with homology to die polypeptides of the invention as identified by BLAST analysis against the GenBank protein (GENPEPT) database.
  • Column 1 shows the polypeptide sequence identification number (SEQ ED NO:) for the polypeptide segments of the invention.
  • Column 2 shows the reading frame used in the translation of the polynucleotide sequences encoding the polypeptide segments.
  • Column 3 shows the length of the translated polypeptide segments.
  • Columns 4 and 5 show the start and stop nucleotide positions of the polynucleotide sequences encoding the polypeptide segments.
  • Column 6 shows the GenBank identification number (GI Number) of the nearest GenBank homolog.
  • Column 7 shows the probabiHty score for the match between each polypeptide and its GenBank homolog.
  • Column 8 shows the annotation of the GenBank homolog.
  • Northern analysis is a laboratory technique used to detect the presence of a transcript of a gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs from a particular ceU type or tissue have been bound. (See, e.g., Sambrook, supra, ch. 7; Ausubel, 1995, supra, ch. 4 and 16.)
  • the product score takes into account both the degree of similarity between two sequences and the length of the sequence match.
  • the product score is a normaHzed value between 0 and 100, and is calculated as foHows: the BLAST score is multipHed by the percent nucleotide identity and the product is divided by (5 times the length of the shorter of the two sequences).
  • the BLAST score is calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair (HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate the product score.
  • the product score represents a balance between fractional overlap and quaHty in a BLAST aHgnment. For example, a product score of 100 is produced only for 100% identity over the 5 entire lengtii of the shorter of the two sequences being compared. A product score of 70 is produced either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the other. A product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% identity and 100% overlap.
  • a tissue distribution profile is determined for each template by compiling the cDNA Hbrary tissue classifications of its component cDNA sequences.
  • Each component sequence is derived from a cDNA Hbrary constructed from a human tissue.
  • Each human tissue is classified into one of the foHowing categories: cardiovascular system; connective tissue; digestive system; embryonic 5 structures; endocrine system; exocrrne glands; genitaHa, female; genitaHa, male; germ ceUs; hemic and immune system; Hver; musculoskeletal system; nervous system; pancreas; respiratory system; sense organs; skin; stomatognathic system; unclassified/mixed; or urinary tract.
  • Template sequences, component sequences, and cDNA Hbrary/tissue information are found in the LEFESEQ GOLD database (Incyte Genomics, Palo Alto CA). 0 Table 6 shows the tissue distribution profile for the templates of the invention. For each template, the three most frequently observed tissue categories are shown in column 3, along with the percentage of component sequences belonging to each category. Only tissue categories with percentage values of ⁇ 10% are shown. A tissue distribution of "widely distributed" in column 3 indicates percentage values of ⁇ 10% in aU tissue categories. 5
  • Transcript images are generated as described in Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, incorporated herein by reference.
  • OHgonucleotide primers designed using a dithp of the Sequence Listing are used to extend the nucleic acid sequence.
  • One primer is synthesized to initiate 5' extension of the template, and the other primer, to initiate 3' extension of the template.
  • the initial primers may be designed using OLIGO 4.06 software (National Biosciences, Inc. (National Biosciences), Plymouth MN), or another appropriate 5 program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C Any stretch of nucleotides which would result in hairpin structures and primer-primer dimerizations are avoided.
  • Selected human cDNA Hbraries are used to extend the sequence. If more than one extension is necessary or desired, additional or nested sets of primers are designed. 5 High fideHty ampHfication is obtained by PCR using methods weU known in the art. PCR is performed in 96-weU plates using the PTC-200 thermal cycler (MJ Research).
  • the reaction mix contains DNA template, 200 nmol of each primer, reaction buffer containing Mg 2+ , (NH ⁇ SO ⁇ and ⁇ - mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the foUowing parameters for primer pair o PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C
  • the parameters for primer pair T7 and SK+ are as foUows: Step 1: 94 °C, 3 min; Step 2: 94°C 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6:
  • the plate is scanned in a FLUOROSKAN H (Labsystems Oy) to measure the fluorescence of the sample and to quantify die concentration of o DNA.
  • FLUOROSKAN H Labelsystems Oy
  • a 5 ⁇ l to 10 ⁇ l aHquot of the reaction mixture is analyzed by electrophoresis on a 1 % agarose mini-gel to determine which reactions are successful in extending the sequence.
  • the extended nucleotides are desalted and concentrated, transferred to 384-weU plates, digested with CviH cholera virus endonuclease (Molecular Biology Research, Madison WI), and sonicated or sheared prior to reHgation into pUC 18 vector (Amersham Pharmacia Biotech).
  • CviH cholera virus endonuclease Molecular Biology Research, Madison WI
  • sonicated or sheared prior to reHgation into pUC 18 vector
  • the digested nucleotides are separated on low concentration (0.6 to 0.8%) agarose gels, fragments are excised, and agar digested with AGAR ACE (Promega).
  • Extended clones are reHgated using T4 Hgase (New England Biolabs, Inc., Beverly MA) into pUC 18 vector (Amersham Pharmacia Biotech), treated witii Pfu DNA polymerase (Stratagene) to fiU-in restriction site overhangs, and transfected into competent E. coH ceUs. Transformed ceUs are selected on o antibiotic-containing media, individual colonies are picked and cultared overnight at 37 °C in 384-weU plates in LB/2x carbeniciUin Hquid media.
  • the ceUs are lysed, and DNA is ampHfied by PCR using Taq DNA polymerase (Amersham Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the foUowing parameters: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 5 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C DNA is quantified by PICOGREEN reagent (Molecular Probes) as desc ⁇ bed above. Samples with low DNA recoveries are reampnfied using the same conditions as described above.
  • Samples are diluted with 20% dimethysulfoxide (1:2, v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (AppHed Biosystems).

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Toxicology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

The present invention provides purified human polynucleotides for diagnostics and therapeutics (dithp). Also encompassed are the polypeptides (DITHP) encoded by dithp. The invention also provides for the use of dithp, or complements, oligonucleotides, or fragments thereof in diagnostic assays. The invention further provides for vectors and host cells containing dithp for the expression of DITHP .The invention additionally provides for the use of isolated and purified DITHP to induce antibodies and to screen libraries of compounds and the use of anti-DITHP antibodies in diagnostic assays. Also provided are microarrays containing dithp and methods of use.

Description

MOLECULES FOR DIAGNOSTICS AND THERAPEUTICS
TECHNICAL FIELD
5 The present invention relates to human molecules and to the use of these sequences in the diagnosis, study, prevention, and treatment of diseases associated with, as well as effects of exogenous compounds on, the expression of human molecules.
BACKGROUND OF THE INVENTION 0 The human genome is comprised of thousands of genes, many encoding gene products that function in the maintenance and growth of the various cells and tissues in the body. Aberrant expression or mutations in these genes and their products is the cause of, or is associated with, a variety of human diseases such as cancer and other cell proliferative disorders, autoimmune/inflammatory disorders, infections, developmental disorders, endocrine disorders, 5 metabolic disorders, neurological disorders, gastrointestinal disorders, transport disorders, and connective tissue disorders. The identification of these genes and their products is the basis of an ever-expanding effort to find markers for early detection of diseases, and targets for their prevention and treatment. Therefore, these genes and their products are useful as diagnostics and therapeutics. These genes may encode, for example, enzyme molecules, molecules associated with growth and o development, biochemical pathway molecules, extracellular information transmission molecules, receptor molecules, intracellular signaling molecules, membrane transport molecules, protein modification and maintenance molecules, nucleic acid synthesis and modification molecules, adhesion molecules, antigen recognition molecules, secreted and extracellular matrix molecules, cytoskeletal molecules, ribosomal molecules, electron transfer associated molecules, transcription factor molecules, 5 chromatin molecules, cell membrane molecules, and organelle associated molecules.
For example, cancer represents a type of cell proliferative disorder that affects nearly every tissue in the body. A wide variety of molecules, either aberrantly expressed or mutated, can be the cause of, or involved with, various cancers because tissue growth involves complex and ordered patterns of cell proliferation, cell differentiation, and apoptosis. Cell proliferation must be regulated to o maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of proteins which control cell cycle progression in response to extracellular signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fall into several categories, including growth factors and their receptors, second messenger and signal 5 transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors. Aberrant expression or mutations in any of these gene products can result in cell proliferative disorders such as cancer. Oncogenes are genes generally derived from normal genes that, through abnormal expression or mutation, can effect the transformation of a normal cell to a malignant one (oncogenesis). Oncoproteins, encoded by oncogenes, can affect cell proliferation in a variety of ways 5 and include growth factors, growth factor receptors, intracellular signal transducers, nuclear transcription factors, and cell-cycle control proteins. In contrast, tumor-suppressor genes are involved in inhibiting cell proliferation. Mutations which cause reduced function or loss of function in tumor-suppressor genes result in aberrant cell proliferation and cancer. Although many different genes and their products have been found to be associated with cell proliferative disorders such as 0 cancer, many more may exist that are yet to be discovered.
DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that 5 predispose them to cancer. A cytochrome p450 gene array is useful to determine whether individuals have one of a number of specific mutations that could result in increased drug metabolism, drug resistance or drug toxicity.
DNA-based array technology is especially relevant for the rapid screening of expression of a large number of genes. There is a growing awareness that gene expression is affected in a global o fashion. A genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, the expression of a large number of genes. In some cases the interactions may be expected, such as when the genes are part of the same signaling pathway. In other cases, such as when the genes participate in separate signaling pathways, the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic 5 treatment affects the expression of a large number of genes.
Enzyme Molecules
The cellular processes of biogenesis and biodegradation involve a number of key enzyme classes including oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. These o enzyme classes are each comprised of numerous substrate-specific enzymes having precise and well regulated functions. These enzymes function by facilitating metabolic processes such as glycolysis, the tricarboxylic cycle, and fatty acid metabolism; synthesis or degradation of amino acids, steroids, phospholipids, alcohols, etc.; regulation of cell signalling, proliferation, inflamation, apoptosis, etc., and through catalyzing critical steps in DNA replication and repair, and the process of translation. 5 Oxidoreductases Many pathways of biogenesis and biodegradation require oxidoreductase (dehydrogenase or reductase) activity, coupled to the reduction or oxidation of a donor or acceptor cofactor. Potential cofactors include cytochromes, oxygen, disulfide, iron-sulfur proteins, flavin adenine dinucleotide (FAD), and the nicotinamide adenine dinucleotides NAD and NADP (Newsholme, E.A. and A.R. Leech (1983) Biochemistry for the Medical Sciences, John Wiley and Sons, Chichester, U.K., pp. 779-793). Reductase activity catalyzes the transfer of electrons between substrate(s) and cofactor(s) with concurrent oxidation of the cofactor. The reverse dehydrogenase reaction catalyzes the reduction of a cofactor and consequent oxidation of the substrate. Oxidoreductase enzymes are a broad superfamily of proteins that catalyze numerous reactions in all cells of organisms ranging from bacteria to plants to humans. These reactions include metabolism of sugar, certain detoxification reactions in the liver, and the synthesis or degradation of fatty acids, amino acids, glucocorticoids, estrogens, androgens, and prostaglandins. Different family members are named according to the direction in which their reactions are typically catalyzed; thus they may be referred to as oxidoreductases, oxidases, reductases, or dehydrogenases. In addition, family members often have distinct cellular localizations, including the cytosol, the plasma membrane, mitochondrial inner or outer membrane, and peroxisomes.
Short-chain alcohol dehydrogenases (SCADs) are a family of dehydrogenases that only share 15% to 30% sequence identity, with similarity predominantly in the coenzyme binding domain and the substrate binding domain. In addition to the well-known role in detoxification of ethanol, SCADs are also involved in synthesis and degradation of fatty acids, steroids, and some prostaglandins, and are therefore implicated in a variety of disorders such as lipid storage disease, myopathy, SCAD deficiency, and certain genetic disorders. For example, retinol dehydrogenase is a SCAD-family member (Simon, A. et al. (1995) J. Biol. Chem. 270:1107-1112) that converts retinol to retinal, the precursor of retinoic acid. Retinoic acid, a regulator of differentiation and apoptosis, has been shown to down-regulate genes involved in cell proliferation and inflammation (Chai, X. et al. (1995) J. Biol. Chem. 270:3900-3904). In addition, retinol dehydrogenase has been linked to hereditary eye diseases such as autosomal recessive childhood-onset severe retinal dystrophy (Simon, A. et al. (1996) Genomics 36:424-430).
Propagation of nerve impulses, modulation of cell proliferation and differentiation, induction of the immune response, and tissue homeostasis involve neurotransmitter metabolism (Weiss, B . ( 1991) Neurotoxicology 12:379-386; Collins, S.M. et al. (1992) Ann. N.Y. Acad. Sci. 664:415-424; Brown, J.K. and H. Imam (1991) J. Inherit. Metab. Dis. 14:436-458). Many pathways of neurotransmitter metabolism require oxidoreductase activity, coupled to reduction or oxidation of a cofactor, such as NAD+/NADH (Newsholme, E.A. and A.R. Leech (1983) Biochemistry for the Medical Sciences. John Wiley and Sons, Chichester, U.K. pp. 779-793). Degradation of catechola ines (epinephrine or norepmepnπnej requires alcohol dehydrogenase (in the brain) or aldehyde dehydrogenase (in peripheral tissue). NAD+ -dependent aldehyde dehydrogenase oxidizes 5-hydroxyindole-3-acetate (the product of 5-hydroxytryptamine (serotonin) metabolism) in the brain, blood platelets, liver and pulmonary endothelium (Newsholme, supra, p. 786). Other neurotransmitter degradation pathways that utilize NAD+ N ADH-dependent oxidoreductase activity include those of L-DOPA (precursor of dopamine, a neuronal excitatory compound), glycine (an inhibitory neurotransmitter in the brain and spinal cord), histamine (liberated from mast cells during the inflammatory response), and taurine (an inhibitory neurotransmitter of the brain stem, spinal cord and retina) (Newsholme. supra, pp. 790, 792). Epigenetic or genetic defects in neurotransmitter metabolic pathways can result in a spectrum of disease states in different tissues including Parkinson disease and inherited myoclonus (McCance, K.L. and S.E. Huether (1994) Pathophvsiology. Mosby-Year Book, Inc., St. Louis MO, pp. 402-404; Gundlach, A.L. (1990) FASEB J. 4:2761-2766).
Tetrahydrofolate is a derivatized glutamate molecule that acts as a carrier, providing activated one-carbon units to a wide variety of biosyntlietic reactions, including synthesis of purines, pyrimidines, and the amino acid methionine. Tetrahydrofolate is generated by the activity of a holoenzyme complex called tetrahydrofolate synthase, which includes three enzyme activities: tetrahydrofolate dehydrogenase, tetrahydrofolate cyclohydrolase, and tetrahydrofolate synthetase. Thus, tetrahydrofolate dehydrogenase plays an important role in generating building blocks for nucleic and amino acids, crucial to proliferating cells. 3-Hydroxyacyl-CoA dehydrogenase (3HACD) is involved in fatty acid metabolism. It catalyzes the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA, with concomitant oxidation of NAD to NADH, in the mitochondria and peroxisomes of eukaryotic cells. In peroxisomes, 3HACD and enoyl-CoA hydratase form an enzyme complex called bifunctional enzyme, defects in which are associated with peroxisomal bifunctional enzyme deficiency. This interruption in fatty acid metabolism produces accumulation of very-long chain fatty acids, disrupting development of the brain, bone, and adrenal glands. Infants born with this deficiency typically die within 6 months (Watkins, P. et al. (1989) J. Clin. Invest. 83:771-777; Online Mendelian Inheritance in Man (OM ), #261515). The neurodegeneration that is characteristic of Alzheimer's disease involves development of extracellular plaques in certain brain regions. A major protein component of these plaques is the peptide amyloid-β (Aβ), which is one of several cleavage products of amyloid precursor protein (APP). 3HACD has been shown to bind the Aβ peptide, and is overexpressed in neurons affected in Alzheimer's disease. In addition, an antibody against 3HACD can block the toxic effects of Aβ in a cell culture model of Alzheimer's disease (Yan, S. et al. (1997) Nature 389:689-695; OMJ , #602057).
Steroids, such as estrogen, testosterone, corticosterone, and others, are generated from a common precursor, cholesterol, and are interconverted into one another. A wide variety of enzymes act upon cholesterol, including a number of dehydrogenases. Steroid dehydrogenases, such as the hydroxysteroid dehydrogenases, are involved in hypertension, fertility, and cancer (Duax, W.L. and D. Ghosh (1997) Steroids 62:95-100). One such dehydrogenase is 3-oxo-5-α-steroid dehydrogenase (OASD), a microsomal membrane protein highly expressed in prostate and other androgen-responsive tissues. OASD catalyzes the conversion of testosterone into dihydrotestosterone, which is the most potent androgen. Dihydrotestosterone is essential for the formation of the male phenotype during embryogenesis, as well as for proper androgen-mediated growth of tissues such as the prostate and male genitalia. A defect in OASD that prevents the conversion of testosterone into dihydrotestosterone leads to a rare form of male pseudohermaphroditis, characterized by defective formation of the external genitalia (Andersson, S. et al. (1991) Nature 354:159-161; Labrie, F. et al. (1992) Endocrinology 131:1571-1573; OMEV1 #264600). Thus, OASD plays a central role in sexual differentiation and androgen physiology.
17β-hydroxysteroid dehydrogenase (17βHSD6) plays an important role in the regulation of the male reproductive hormone, dihydrotestosterone (DHTT). 17βHSD6 acts to reduce levels of DHTT by oxidizing a precursor of DHTT, 3α-diol, to androsterone which is readily glucuronidated and removed from tissues. 17βHSD6 is active with both androgen and estrogen substrates when expressed in embryonic kidney 293 cells. At least five other isozymes of 17βHSD have been identified that catalyze oxidation and/or reduction reactions in various tissues with preferences for different steroid substrates (Biswas, M.G. and D.W. Russell (1997) J. Biol. Chem. 272:15959-15966). For example, 17βHSDl preferentially reduces estradiol and is abundant in the ovary and placenta. 17βHSD2 catalyzes oxidation of androgens and is present in the endometrium and placenta. 17βHSD3 is exclusively a reductive enzyme in the testis (Geissler, W.M. et al. (1994) Nat. Genet. 7:34-39). An excess of androgens such as DHTT can contribute to certain disease states such as benign prostatic hyperplasia and prostate cancer. Oxidoreductases are components of the fatty acid metabolism pathways in mitochondria and peroxisomes. The main beta-oxidation pathway degrades both saturated and unsaturated fatty acids, while the auxiliary pathway performs additional steps required for the degradation of unsaturated fatty acids. The auxiliary beta-oxidation enzyme 2,4-dienoyl-CoA reductase catalyzes the removal of even- numbered double bonds from unsaturated fatty acids prior to their entry into the main beta-oxidation pathway. The enzyme may also remove odd-numbered double bonds from unsaturated fatty acids (Koivuranta, K.T. et al. (1994) Biochem. J. 304:787-792; Smeland, T.E. et al. (1992) Proc. Natl. Acad. Sci. USA 89:6673-6677). 2,4-dienoyl-CoA reductase is located in both mitochondria and peroxisomes. Inherited deficiencies in mitochondrial and peroxisomal beta-oxidation enzymes are associated with severe diseases, some of which manifest themselves soon after birth and lead to death within a few years. Defects in beta-oxidation are associated with Reye's syndrome, Zellweger syndrome, neonatal adrenoleukodystrophy, infantile Refsum's disease, acyl-CoA oxidase deficiency, and bifunctional protein deficiency (Suzuki, Y. et al. (1994) Am. J. Hum. Genet. 54:36-43; Hoefler, supra; Cotran, R.S. et al. (1994) Robbins Pathologic Basis of Disease, W.B. Saunders Co., Philadelphia PA, p.866). Peroxisomal beta-oxidation is impaired in cancerous tissue. Although 5 neoplastic human breast epithelial cells have the same number of peroxisomes as do normal cells, fatty acyl-CoA oxidase activity is lower than in control tissue (el Bouhtoury, F. et al. (1992) J. Pathol. 166:27-35). Human colon carcinomas have fewer peroxisomes than normal colon tissue and have lower fatty-acyl-CoA oxidase and bifunctional enzyme (including enoyl-CoA hydratase) activities than normal tissue (Cable, S. et al. (1992) Virchows Arch. B Cell Pathol. Incl. Mol. Pathol. 62:221-226). 0 Another important oxidoreductase is isocitrate dehydrogenase, which catalyzes the conversion of isocitrate to a-ketoglutarate, a substrate of the citric acid cycle. Isocitrate dehydrogenase can be either NAD or NADP dependent, and is found in the cytosol, mitochondria, and peroxisomes. Activity of isocitrate dehydrogenase is regulated developmentally, and by hormones, neurotransmitters, and growth factors. 5 Hydroxypyruvate reductase (HPR), a peroxisomal 2-hydroxyacid dehydrogenase in the glycolate pathway, catalyzes the conversion of hydroxypyruvate to glycerate with the oxidation of both NADH and NADPH. The reverse dehydrogenase reaction reduces NAD+ and NADP+. HPR recycles nucleotides and bases back into pathways leading to the synthesis of ATP and GTP. ATP and GTP are used to produce DNA and RNA and to control various aspects of signal transduction 0 and energy metabolism. Inhibitors of purine nucleotide biosynthesis have long been employed as antiproliferative agents to treat cancer and viral diseases. HPR also regulates biochemical synthesis of serine and cellular serine levels available for protein synthesis.
The mitochondrial electron transport (or respiratory) chain is a series of oxidoreductase-type enzyme complexes in the mitochondrial membrane that is responsible for the transport of electrons 5 from NADH through a series of redox centers within these complexes to oxygen, and the coupling of this oxidation to the synthesis of ATP (oxidative phosphorylation). ATP then provides the primary source of energy for driving a cell's many energy-requiring reactions. The key complexes in the respiratory chain are NADH:ubiquinone oxidoreductase (complex I), succinate:ubiquinone oxidoreductase (complex II), cytochrome crb oxidoreductase (complex HI), cytochrome c oxidase o (complex IV), and ATP synthase (complex V) (Alberts, B. et al. (1994) Molecular Biology of the Cell, Garland Publishing, Inc., New York NY, pp. 677-678). All of these complexes are located on the inner matrix side of the mitochondrial membrane except complex JJ, which is on the cytosolic side. Complex π transports electrons generated in the citric acid cycle to the respiratory chain. The electrons generated by oxidation of succinate to fumarate in the citric acid cycle are transferred 5 through electron carriers in complex II to membrane bound ubiquinone (Q). Transcriptional regulation of these nuclear-encoded genes appears to be the predominant means for controlling the biogenesis of respiratory enzymes. Defects and altered expression of enzymes in the respiratory chain are associated with a variety of disease conditions.
Other dehydrogenase activities using NAD as a cofactor are also important in mitochondrial function. 3-hydroxyisobutyrate dehydrogenase (3HBD), important in valine catabolism, catalyzes the NAD-dependent oxidation of 3-hydroxyisobutyrate to methylmalonate semialdehyde within mitochondria. Elevated levels of 3-hydroxyisobutyrate have been reported in a number of disease states, including ketoacidosis, methylmalonic acidemia, and other disorders associated with deficiencies in methylmalonate semialdehyde dehydrogenase (Rougraf , P.M. et al. (1989) J. Biol. Chem. 264:5899-5903).
Another mitochondrial dehydrogenase important in amino acid metabolism is the enzyme isovaleryl-CoA-dehydrogenase (IVD). IVD is involved in leucine metabolism and catalyzes the oxidation of isovaleryl-CoA to 3-methylcrotonyl-CoA. Human IVD is a tetrameric flavoprotein that is encoded in the nucleus and synthesized in the cytosol as a 45 kDa precursor with a mitochondrial import signal sequence. A genetic deficiency, caused by a mutation in the gene encoding IVD, results in the condition known as isovaleric acidemia. This mutation results in inefficient mitochondrial import and processing of the IVD precursor (Vockley, J. et al. (1992) J. Biol. Chem. 267:2494-2501). Transferases
Transferases are enzymes that catalyze the transfer of molecular groups. The reaction may involve an oxidation, reduction, or cleavage of covalent bonds, and is often specific to a substrate or to particular sites on a type of substrate. Transferases participate in reactions essential to such functions as synthesis and degradation of cell components, regulation of cell functions including cell signaling, cell proliferation, inflamation, apoptosis, secretion and excretion. Transferases are involved in key steps in disease processes involving these functions. Transferases are frequently classified according to the type of group transferred. For example, methyl transferases transfer one-carbon methyl groups, amino transferases transfer nitrogenous amino groups, and similarly denominated enzymes transfer aldehyde or ketone, acyl, glycosyl, alkyl or aryl, isoprenyl, saccharyl, phosphorous-containing, sulfur- containing, or selenium-containing groups, as well as small enzymatic groups such as Coenzyme A. Acyl transferases include peroxisomal carnitine octanoyl transferase, which is involved in the fatty acid beta-oxidation pathway, and mitochondrial carnitine palmitoyl transferases, involved in fatty acid metabolism and transport. Choline O-acetyl transferase catalyzes the biosynthesis of the neurotransmitter acetylcholine.
Amino transferases play key roles in protein synthesis and degradation, and they contribute to other processes as well. For example, the amino transferase 5-aminolevulinic acid synthase catalyzes the addition of succinyl-CoA to glycine, the first step in heme biosynthesis. Other amino transferases participate in pathways important for neurological function and metabolism. For example, glutamine- phenylpyruvate amino transferase, also known as glutamine transaminase K (GTK), catalyzes several reactions with a pyridoxal phosphate cofactor. GTK catalyzes the reversible conversion of L- glutamine and phenylpyruvate to 2-oxoglutaramate and L-phenylalanine. Other amino acid substrates 5 for GTK include L-methionine, L-histidine, and L-tyrosine. GTK also catalyzes the conversion of kynurenine to kynurenic acid, a tryptophan metabolite that is an antagonist of the N-methyl-D- aspartate (NMDA) receptor in the brain and may exert a neuromodulatory function. Alteration of the kynurenine metabolic pathway may be associated with several neurological disorders. GTK also plays a role in the metabolism of halogenated xenobiotics conjugated to glutathione, leading to nephrotoxicity 0 in rats and neurotoxicity in humans. GTK is expressed in kidney, liver, and brain. Both human and rat GTKs contain a putative pyridoxal phosphate binding site (ExPASy ENZYME: EC 2.6.1.64; Perry, S.J. et al. (1993) Mol. Pharmacol. 43:660-665; Perry, S. et al. (1995) FEBS Lett. 360:277-280; and Alberati-Giani, D. et al. (1995) J. Neurochem. 64:1448-1455). A second amino transferase associated with this pathway is kynurerώie/α-aminoadipate amino transferase (AadAT). AadAT catalyzes the 5 reversible conversion of α-aminoadipate and α-ketoglutarate to α-ketoadipate and L-glutamate during lysine metabolism. AadAT also catalyzes the transamination of kynurenine to kynurenic acid. A cytosolic AadAT is expressed in rat kidney, liver, and brain (Nakatani, Y. et al. (1970) Biochim. Biophys. Acta 198:219-228; Buchli, R. et al. (1995) J. Biol. Chem. 270:29330-29335).
Glycosyl transferases include the mammalian UDP-glucouronosyl transferases, a family of o membrane-bound microsomal enzymes catalyzing the transfer of glucouronic acid to lipophilic substrates in reactions that play important roles in detoxification and excretion of drugs, carcinogens, and other foreign substances. Another mammalian glycosyl transferase, mammalian UDP-galactose- ceramide galactosyl transferase, catalyzes the transfer of galactose to ceramide in the synthesis of galactocerebrosides in myelin membranes of the nervous system. The UDP-glycosyl transferases 5 share a conserved signature domain of about 50 amino acid residues (PROSITE: PDOC00359, http://expasy.hcuge.ch/sprot/prosite.html).
Methyl transferases are involved in a variety of pharmacologically important processes. Nicotinamide N-methyl transferase catalyzes the N-methylation of nicotinamides and other pyridines, an important step in the cellular handling of drugs and other foreign compounds. Phenylethanolamine o N-methyl transferase catalyzes the conversion of noradrenalin to adrenalin. 6-O-methylguanine-DNA methyl transferase reverses DNA methylation, an important step in carcinogenesis. Uroporphyrin-JU C-methyl transferase, which catalyzes the transfer of two methyl groups from S-adenosyl-L- methionine to uroporphyrinogen HI, is the first specific enzyme in the biosynthesis of cobalamin, a dietary enzyme whose uptake is deficient in pernicious anemia. Protein-arginine methyl transferases 5 catalyze the posttranslational methylation of arginine residues in proteins, resulting in the mono- and dimethylation of arginine on the guanidino group. Substrates include histones, myelin basic protein, and heterogeneous nuclear ribonucleoproteins involved in mRNA processing, splicing, and transport. Protein-arginine methyl transferase interacts with proteins upregulated by mitogens, with proteins involved in chronic lymphocytic leukemia, and with interferon, suggesting an important role for 5 methylation in cytokine receptor signaling (Lin, W.-J. et al. (1996) J. Biol. Chem. 271:15034-15044; Abramovich, C. et al. (1997) EMBO J. 16:260-266; and Scott, H.S. et al. (1998) Genomics 48:330- 340).
Phosphotransferases catalyze the transfer of high-energy phosphate groups and are important in energy-requiring and -releasing reactions. The metabolic enzyme creatine kinase catalyzes the o reversible phosphate transfer between creatine/creatine phosphate and ATP/ADP. Glycocyamine kinase catalyzes phosphate transfer from ATP to guanidoacetate, and arginine kinase catalyzes phosphate transfer from ATP to arginine. A cysteine-containing active site is conserved in this family (PROSITE: PDOC00103).
Prenyl transferases are heterodimers, consisting of an alpha and a beta subunit, that catalyze 5 the transfer of an isoprenyl group. An example of a prenyl transferase is the mammalian protein farnesyl transferase. The alpha subunit of farnesyl transferase consists of 5 repeats of 34 amino acids each, with each repeat containing an invariant tryptophan (PROSITE: PDOC00703).
Saccharyl transferases are glycating enzymes involved in a variety of metabolic processes. Oligosacchryl transferase-48, for example, is a receptor for advanced glycation endproducts. o Accumulation of these endproducts is observed in vascular complications of diabetes, macrovascular disease, renal insufficiency, and Alzheimer's disease (Thornalley, PJ. (1998) Cell Mol. Biol. (Noisy- Le-Grand) 44:1013-1023).
Coenzyme A (Co A) transferase catalyzes the transfer of Co A between two carboxylic acids. Succinyl CoA:3-oxoacid CoA transferase, for example, transfers CoA from succinyl-CoA to a 5 recipient such as acetoacetate. Acetoacetate is essential to the metabolism of ketone bodies, which accumulate in tissues affected by metabolic disorders such as diabetes (PROSITE: PDOC00980). Hydrolases
Hydrolysis is the breaking of a covalent bond in a substrate by introduction of a molecule of water. The reaction involves a nucleophilic attack by the water molecule's oxygen atom on a target o bond in the substrate. The water molecule is split across the target bond, breaking the bond and generating two product molecules. Hydrolases participate in reactions essential to such functions as synthesis and degradation of cell components, and for regulation of cell functions including cell signaling, cell proliferation, inflamation, apoptosis, secretion and excretion. Hydrolases are involved in key steps in disease processes involving these functions. Hydrolytic enzymes, or hydrolases, may be 5 grouped by substrate specificity into classes including phosphatases, peptidases, lysophospholipases, phosphodiesterases, glycosidases, and glyoxalases.
Phosphatases hydrolytically remove phosphate groups from proteins, an energy-providing step that regulates many cellular processes, including intracellular signaling pathways that in turn control cell growth and differentiation, cell-cell contact, the cell cycle, and oncogenesis. 5 Lysophospholipases (LPLs) regulate intracellular lipids by catalyzing the hydrolysis of ester bonds to remove an acyl group, a key step in lipid degradation. Small LPL isoforms, approximately 15-30 kD, function as hydrolases; larger isoforms function both as hydrolases and transacylases. A particular substrate for LPLs, lysophosphatidylcholine, causes lysis of cell membranes. LPL activity is regulated by signaling molecules important in numerous pathways, including the inflammatory o response.
Peptidases, also called proteases, cleave peptide bonds that form the backbone of peptide or protein chains. Proteolytic processing is essential to cell growth, differentiation, remodeling, and homeostasis as well as inflammation and immune response. Since typical protein half-lives range from hours to a few days, peptidases are continually cleaving precursor proteins to their active form, 5 removing signal sequences from targeted proteins, and degrading aged or defective proteins.
Peptidases function in bacterial, parasitic, and viral invasion and replication within a host. Examples of peptidases include trypsin and chymotrypsin (components of the complement cascade and the blood-clotting cascade) lysosomal cathepsins, calpains, pepsin, renin, and chymosin (Beynon, R. J. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach. Oxford University Press, New York 0 NY, pp. 1-5).
The phosphodiesterases catalyze the hydrolysis of one of the two ester bonds in a phosphodiester compound. Phosphodiesterases are therefore crucial to a variety of cellular processes. Phosphodiesterases include DNA and RNA endo- and exo-nucleases, which are essential to cell growth and replication as well as protein synthesis. Another phosphodiesterase is acid 5 sphingomyelinase, which hydrolyzes the membrane phospholipid sphingomyelin to ceramide and phosphorylcholine. Phosphorylcholine is used in the synthesis of phosphatidylcholine, which is involved in numerous intracellular signaling pathways. Ceramide is an essential precursor for the generation of gangliosides, membrane lipids found in high concentration in neural tissue. Defective acid sphingomyelinase phosphodiesterase leads to a build-up of sphingomyelin molecules in lysosomes, o resulting in Niemann-Pick disease.
Glycosidases catalyze the cleavage of hemiacetyl bonds of glycosides, which are compounds that contain one or more sugar. Mammalian lactase-phlorizin hydrolase, for example, is an intestinal enzyme that splits lactose. Mammalian beta-galactosidase removes the terminal galactose from gangliosides, glycoproteins, and glycosaminoglycans, and deficiency of this enzyme is associated with 5 a gangliosidosis known as Morquio disease type B. Vertebrate lysosomal alpha-glucosidase, which hydrolyzes glycogen, maltose, and isomaltose, and vertebrate intestinal sucrase-isomaltase, which hydrolyzes sucrose, maltose, and isomaltose, are widely distributed members of this family with highly conserved sequences at their active sites.
The glyoxylase system is involved in gluconeogenesis, the production of glucose from storage 5 compounds in the body. It consists of glyoxylase I, which catalyzes the formation of S-D- lactoylglutathione from methyglyoxal, a side product of triose-phosphate energy metabolism, and glyoxylase π, which hydrolyzes S-D-lactoylglutathione to D-lactic acid and reduced glutathione. Glyoxylases are involved in hyperglycemia, non-insulin-dependent diabetes mellitus, the detoxification of bacterial toxins, and in the control of cell proliferation and microtubule assembly. 0 Lyases
Lyases are a class of enzymes that catalyze the cleavage of C-C, C-O, C-N, C-S, C-(halide), P-O or other bonds without hydrolysis or oxidation to form two molecules, at least one of which contains a double bond (Stryer, L. (1995) Biochemistry W.H. Freeman and Co. New York, NY p.620). Lyases are critical components of cellular biochemistry with roles in metabolic energy 5 production including fatty acid metabolism, as well as other diverse enzymatic processes. Further classification of lyases reflects the type of bond cleaved as well as the nature of the cleaved group.
The group of C-C lyases include carboxyl-lyases (decarboxylases), aldehyde-lyases (aldolases), oxo-acid-lyases and others. The C-O lyase group includes hydro-lyases, lyases acting on polysaccharides and other lyases. The C-N lyase group includes ammonia-lyases, amidine-lyases, 0 amine-lyases (deaminases) and other lyases.
Proper regulation of lyases is critical to normal physiology. For example, mutation induced deficiencies in the uroporphyrinogen decarboxylase can lead to photosensitive cutaneous lesions in the genetically-linked disorder familial porphyria cutanea tarda (Mendez, M. et al. (1998) Am. J. Genet. 63:1363-1375). It has also been shown that adenosine deaminase (ADA) deficiency stems from 5 genetic mutations in the ADA gene, resulting in the disorder severe combined immunodeficiency disease (SCID) (Hershfield, M.S. (1998) Semin. Hematol. 35:291-298). Isomerases
Isomerases are a class of enzymes that catalyze geometric or structural changes within a molecule to form a single product. This class includes racemases and epimerases, cis-trans- o isomerases, intramolecular oxidoreductases, intramolecular transferases (mutases) and intramolecular lyases. Isomerases are critical components of cellular biochemistry with roles in metabolic energy production including glycolysis, as well as other diverse enzymatic processes (Stryer, L. (1995) Biochemistry, W.H. Freeman and Co., New York NY, pp.483-507).
Racemases are a subset of isomerases that catalyze inversion of a molecules configuration 5 around the asymmetric carbon atom in a substrate having a single center of asymmetry, thereby interconverting two racemers. Epimerases are another subset of isomerases that catalyze inversion of configuration around an asymmetric carbon atom in a substrate with more than one center of symmetry, thereby interconverting two epimers. Racemases and epimerases can act on amino acids and derivatives, hydroxy acids and derivatives, as well as carbohydrates and derivatives. The interconversion of UDP-galactose and UDP-glucose is catalyzed by UDP-galactose-4'-epimerase. Proper regulation and function of this epi erase is essential to the synthesis of glycoproteins and glycolipids. Elevated blood galactose levels have been correlated with UDP-galactose-4 -epimerase deficiency in screening programs of infants (Gitzelmann, R. (1972) Helv. Paediat. Acta 27:125-130). Oxidoreductases can be isomerases as well. Oxidoreductases catalyze the reversible transfer of electrons from a substrate that becomes oxidized to a substrate that becomes reduced. This class of enzymes includes dehydrogenases, hydroxylases, oxidases, oxygenases, peroxidases, and reductases. Proper maintenance of oxidoreductase levels is physiologically important. For example, genetically-linked deficiencies in lipoamide dehydrogenase can result in lactic acidosis (Robinson, B.H. et al. (1977) Pediat. Res. 11:1198-1202). Another subgroup of isomerases are the transferases (or mutases). Transferases transfer a chemical group from one compound (the donor) to another compound (the acceptor). The types of groups transferred by these enzymes include acyl groups, amino groups, phosphate groups (phosphotransferases or phosphomutases), and others. The transferase carnitine palmitoyltransferase is an important component of fatty acid metabolism. Genetically-linked deficiencies in this transferase can lead to myopathy (Scriver, CR. et al. (1995) The Metabolic and Molecular Basis of Inherited Disease, McGraw-Hill, New York NY, pp.1501-1533).
Yet another subgroup of isomerases are the topoisomersases. Topoisomerases are enzymes that affect the topological state of DNA. For example, defects in topoisomerases or their regulation can affect normal physiology. Reduced levels of topoisomerase II have been correlated with some of the DNA processing defects associated with the disorder ataxia-telangiectasia (Singh, S.P. et al. (1988) Nucleic Acids Res. 16:3919-3929). Ligases
Ligases catalyze the formation of a bond between two substrate molecules. The process involves the hydrolysis of a pyrophosphate bond in ATP or a similar energy donor. Ligases are classified based on the nature of the type of bond they form, which can include carbon-oxygen, carbon-sulfur, carbon-nitrogen, carbon-carbon and phosphoric ester bonds.
Ligases forming carbon-oxygen bonds include the aminoacyl-transfer RNA (fRNA) synthetases which are important RNA-associated enzymes with roles in translation. Protein biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA. The aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of an amino acid with its cognate tRNA. The 20 aminoacyl-tRNA synthetase enzymes can be divided into two structural classes, and each class is characterized by a distinctive topology of the catalytic domain. Class I enzymes contain a catalytic domain based on the nucleotide-binding Rossman fold. Class II enzymes contain a central catalytic domain, which consists of a seven-stranded antiparallel β-sheet motif, as well as N- and C- terminal regulatory domains. Class II enzymes are separated into two groups based on the heterodimeric or homodimeric structure of the enzyme; the latter group is further subdivided by the structure of the N- and C-terminal regulatory domains (Hartlein, M. and S. Cusack (1995) J. Mol. Evol. 40:519-530). Autoantibodies against aminoacyl-tRNAs are generated by patients with dermatomyositis and polymyositis, and correlate strongly with complicating interstitial lung disease (ELD). These antibodies appear to be generated in response to viral infection, and coxsackie virus has been used to induce experimental viral myositis in animals.
Ligases forming carbon-sulfur bonds (Acid-thiol ligases) mediate a large number of cellular biosynthetic intermediary metabolism processes involve intermolecular transfer of carbon atom-containing substrates (carbon substrates). Examples of such reactions include the tricarboxylic acid cycle, synthesis of fatty acids and long-chain phosphoHpids, synthesis of alcohols and aldehydes, synthesis of intermediary metabolites, and reactions involved in the amino acid degradation pathways. Some of these reactions require input of energy, usually in the form of conversion of ATP to either ADP or AMP and pyrophosphate.
In many cases, a carbon substrate is derived from a small molecule containing at least two carbon atoms. The carbon substrate is often covalently bound to a larger molecule which acts as a carbon substrate carrier molecule within the cell. In the biosynthetic mechanisms described above, the carrier molecule is coenzyme A. Coenzyme A (CoA) is structurally related to derivatives of the nucleotide ADP and consists of 4'-phosphopantetheine linked via a phosphodiester bond to the alpha phosphate group of adenosine 3',5'-bisphosphate. The terminal thiol group of 4'-phosphopantetheine acts as the site for carbon substrate bond formation. The predominant carbon substrates which utilize CoA as a carrier molecule during biosynthesis and intermediary metabolism in the cell are acetyl, succinyl, and propionyl moieties, collectively referred to as acyl groups. Other carbon substrates include enoyl lipid, which acts as a fatty acid oxidation intermediate, and carnitine, which acts as an acetyl-CoA flux regulator/ mitochondrial acyl group transfer protein. Acyl-CoA and acetyl-CoA are synthesized in the cell by acyl-CoA synthetase and acetyl-CoA synthetase, respectively.
Activation of fatty acids is mediated by at least three forms of acyl-CoA synthetase activity: i) acetyl-CoA synthetase, which activates acetate and several other low molecular weight -carboxylic acids and is found in muscle mitochondria and the cytosol of other tissues; ii) medium-chain acyl-CoA synthetase, which activates fatty acids containing between four and eleven carbon atoms (predominantly from dietary sources), and is present only in liver mitochondria; and iii) acyl CoA synthetase, whch s spec fic for long c an atty ac ds w th between six and twenty carbon atoms, and is found in microsomes and the mitochondria. Proteins associated with acyl-CoA synthetase activity have been identified from many sources including bacteria, yeast, plants, mouse, and man. The activity of acyl-CoA synthetase may be modulated by phosphorylation of the enzyme by cAMP-dependent protein kinase.
Ligases forming carbon-nitrogen bonds include amide synthases such as glutamine synthetase (glutamate-ammonia ligase) that catalyzes the amination of glutamic acid to glutamine by ammonia using the energy of ATP hydrolysis. Glutamine is the primary source for the amino group in various amide transfer reactions involved in de novo pyrimidine nucleotide synthesis and in purine and pyrimidine ribonucleotide interconversions. Overexpression of glutamine synthetase has been observed in primary liver cancer (Christa, L. et al. (1994) Gastroent. 106:1312-1320).
Acid-amino-acid ligases (peptide synthases) are represented by the ubiquitin proteases which are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of cellular proteins in eukaryotic cells and some bacteria. The UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression. In the UCS pathway, proteins targeted for degradation are conjugated to a ubiquitin (Ub), a small heat stable protein. Ub is first activated by a ubiquitin-activating enzyme (El), and then transferred to one of several Ub-conjugating enzymes (E2). E2 then links the Ub molecule through its C-terminal glycine to an internal lysine (acceptor lysine) of a target protein. The ubiquitinated protein is then recognized and degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease. The UCS is implicated in the degradation of mitotic cyclic kinases, oncoproteins, tumor suppressor genes such as p53, viral proteins, cell surface receptors associated with signal transduction, transcriptional regulators, and mutated or damaged proteins (Ciechanover, A. (1994) Cell 79:13-21). A murine proto-oncogene, Unp, encodes a nuclear ubiquitin protease whose overexpression leads to oncogenic transformation of NTH3T3 cells, and the human homolog of this gene is consistently elevated in small cell tumors and adenocarcinomas of the lung (Gray, D.A. (1995) Oncogene 10:2179-2183).
Cyclo-ligases and other carbon-nitrogen ligases comprise various enzymes and enzyme complexes that participate in the de novo pathways to purine and pyrimidine biosynthesis. Because these pathways are critical to the synthesis of nucleotides for replication of both RNA and DNA, many of these enzymes have been the targets of clinical agents for the treatment of cell proliferative disorders such as cancer and infectious diseases.
Purine biosynthesis occurs de novo from the amino acids glycine and glutamine, and other small molecules. Three of the key reactions in this process are catalyzed by a trifunctional enzyme composed of glyc nami e-r onuc eot e syn etase , am noimi azo e r onuc eot e synt etase
(AJRS), and glycinamide ribonucleotide transformylase (GART). Together these three enzymes combine ribosylamine phosphate with glycine to yield phosphoribosyl aminoimidazole, a precursor to both adenylate and guanylate nucleotides. This trifunctional protein has been implicated in the 5 pathology of Downs syndrome (Aimi, J. et al. (1990) Nucleic Acid Res. 18:6665-6672).
Adenylosuccinate synthetase catalyzes a later step in purine biosynthesis that converts inosinic acid to adenylosuccinate, a key step on the path to ATP synthesis. This enzyme is also similar to another carbon-nitrogen ligase, argininosuccinate synthetase, that catalyzes a similar reaction in the urea cycle (Powell, S.M. et al. (1992) FEBS Lett. 303:4-10). 0 Like the de novo biosynthesis of purines, de novo synthesis of the pyrimidine nucleotides uridylate and cytidylate also arises from a common precursor, in this instance the nucleotide orotidylate derived from orotate and phosphoribosyl pyrophosphate (PPRP). Again a trifunctional enzyme comprising three carbon-nitrogen ligases plays a key role in the process. In this case the enzymes aspartate transcarbamylase (ATCase), carbamyl phosphate synthetase II, and dihydroorotase 5 (DHOase) are encoded by a single gene called CAD. Together these three enzymes combine the initial reactants in pyrimidine biosynthesis, glutamine, CO2 and ATP to form dihydroorotate, the precursor to orotate and orotidylate (Iwahana, H. et al. (1996) Biochem. Biophys. Res. Commun. 219:249-255). Further steps then lead to the synthesis of uridine nucleotides from orotidylate. Cytidine nucleotides are derived from uridine-5 -triphosphate (UTP) by the amidation of UTP using 0 glutamine as the amino donor and the enzyme CTP synthetase. Regulatory mutations in the human CTP synthetase are believed to confer multi-drug resistance to agents widely used in cancer therapy (Yamauchi, M. et al. (1990) EMBO J. 9:2095-2099).
Ligases forming carbon-carbon bonds include the carboxylases acetyl-CoA carboxylase and pyruvate carboxylase. Acetyl-CoA carboxylase catalyzes the carboxylation of acetyl-CoA from CO2 5 and ILO using the energy of ATP hydrolysis. Acetyl-CoA carboxylase is the rate-limiting step in the biogenesis of long-chain fatty acids. Two isoforms of acetyl-CoA carboxylase, types I and types π, are expressed in human in a tissue-specific manner (Ha, J. et al. (1994) Eur. J. Biochem. 219:297- 306). Pyruvate carboxylase is a nuclear-encoded mitochondrial enzyme that catalyzes the conversion of pyruvate to oxaloacetate, a key intermediate in the citric acid cycle. o Ligases forming phosphoric ester bonds include the DNA ligases involved in both DNA replication and repair. DNA ligases seal phosphodiester bonds between two adjacent nucleotides in a DNA chain using the energy from ATP hydrolysis to first activate the free 5 -phosphate of one nucleotide and then react it with the 3 -OH group of the adjacent nucleotide. This resealing reaction is used in both DNA replication to join small DNA fragments called Okazaki fragments that are 5 transiently formed in the process of replicating new DNA, and in DNA repair. DNA repair is the process by which accidental base changes, such as those produced by oxidative damage, hydrolytic attack, or uncontrolled methylation of DNA, are corrected before replication or transcription of the DNA can occur. Bloom's syndrome is an inherited human disease in which individuals are partially deficient in DNA ligation and consequently have an increased incidence of cancer (Alberts, B. et al. (1994) The Molecular Biology of the Cell, Garland Publishing Inc. , New York NY, p. 247).
Molecules Associated with Growth and Development
Human growth and development requires the spatial and temporal regulation of cell differentiation, cell proliferation, and apoptosis. These processes coordinately control reproduction, aging, embryogenesis, morphogenesis, organogenesis, and tissue repair and maintenance. At the cellular level, growth and development is governed by the cell's decision to enter into or exit from the cell division cycle and by the cell's commitment to a terminally differentiated state. These decisions are made by the cell in response to extracellular signals and other environmental cues it receives. The following discussion focuses on the molecular mechanisms of cell division, reproduction, cell differentiation and proliferation, apoptosis, and aging. Cell Division
Cell division is the fundamental process by which all living things grow and reproduce. In unicellular organisms such as yeast and bacteria, each cell division doubles the number of organisms, while in multicellular species many rounds of cell division are required to replace cells lost by wear or by programmed cell death, and for cell differentiation to produce a new tissue or organ. Details of the cell division cycle may vary, but the basic process consists of three principle events. The first event, interphase, involves preparations for cell division, replication of the DNA, and production of essential proteins. In the second event, mitosis, the nuclear material is divided and separates to opposite sides of the cell. The final event, cytokinesis, is division and fission of the cell cytoplasm. The sequence and timing of cell cycle transitions is under the control of the cell cycle regulation system which controls the process by positive or negative regulatory circuits at various check points.
Regulated progression of the cell cycle depends on the integration of growth control pathways with the basic cell cycle machinery. Cell cycle regulators have been identified by selecting for human and yeast cDNAs that block or activate cell cycle arrest signals in the yeast mating pheromone pathway when hey are overexpressed. Known regulators include human CPR (cell cycle progression restoration) genes, such as CPR8 and CPR2, and yeast CDC (cell division control) genes, including CDC91 , that block the arrest signals . The CPR genes express a variety of proteins including cyclins , tumor suppressor binding proteins, chaperones, transcription factors, translation factors, and RNA-binding proteins (Edwards, M.C. et al.(1997) Genetics 147:1063-1076). Several cell cycle transitions, including the entry and exit of a cell from mitosis, are dependent upon the activation and inhibition of cyclin-dependent kinases (Cdks). The Cdks are composed of a kinase subunit, Cdk, and an activating subunit, cyclin, in a complex that is subject to many levels of regulation. There appears to be a single Cdk in Saccharomyces cerevisiae and Saccharomyces pombe whereas mammals have a variety of specialized Cdks. Cyclins act by binding to and activating cyclin-dependent protein kinases which then phosphorylate and activate selected proteins involved in the mitotic process. The Cdk-cyclin complex is both positively and negatively regulated by phosphorylation, and by targeted degradation involving molecules such as CDC4 and CDC53. In addition, Cdks are further regulated by binding to inhibitors and other proteins such as Sucl that modify their specificity or accessibility to regulators (Patra, D. and W.G. Dunphy (1996) Genes Dev. 10:1503-1515; and Mathias, N. et al. (1996) Mol. Cell Biol. 16:6634-6643). Reproduction
The male and female reproductive systems are complex and involve many aspects of growth and development. The anatomy and physiology of the male and female reproductive systems are reviewed in (Guyton, A.C. (1991) Textbook of Medical Physiology, W.B. Saunders Co., Philadelphia PA, pp. 899-928).
The male reproductive system includes the process of spermatogenesis, in which the sperm are formed, and male reproductive functions are regulated by various hormones and their effects on accessory sexual organs, cellular metabolism, growth, and other bodily functions.
Spermatogenesis begins at puberty as a result of stimulation by gonadotropic hormones released from the anterior pituitary. Immature sperm (spermatogonia) undergo several mitotic cell divisions before undergoing meiosis and full maturation. The testes secrete several male sex hormones, the most abundant being testosterone, that is essential for growth and division of the immature sperm, and for the masculine characteristics of the male body. Three other male sex hormones, gonadotropin-releasing hormone (GnRH), luteinizing hormone (LH), and folHcle-stimulating hormone (FSH) control sexual function.
The uterus, ovaries, fallopian tubes, vagina, and breasts comprise the female reproductive system. The ovaries and uterus are the source of ova and the location of fetal development, respectively. The fallopian tubes and vagina are accessory organs attached to the top and bottom of the uterus, respectively. Both the uterus and ovaries have additional roles in the development and loss of reproductive capability during a female' s lifetime. The primary role of the breasts is lactation.
Multiple endocrine signals from the ovaries, uterus, pituitary, hypothalamus, adrenal glands, and other tissues coordinate reproduction and lactation. These signals vary during the monthly menstruation cycle and during the female's lifetime. Similarly, the sensitivity of reproductive organs to these endocrine signals varies during the female's lifetime. A combination of positive and negative feedback to the ovaries, pituitary and hypothalamus glands controls physiologic changes during the monthly ovulation and endometrial cycles. The anterior pituitary secretes two major gonadotropin hormones, follicie-stimulating hormone (FSH) and luteinizing hormone (LH), regulated by negative feedback of steroids, most notably by ovarian estradiol. If fertilization does not occur, estrogen and progesterone levels decrease. This sudden reduction of the ovarian hormones leads to menstruation, the desquamation of the endometrium.
Hormones further govern all the steps of pregnancy, parturition, lactation, and menopause. During pregnancy large quantities of human chorionic gonadotropin (hCG), estrogens, progesterone, and human chorionic somatomammotropin (hCS) are formed by the placenta. hCG, a glycoprotein similar to luteinizing hormone, stimulates the corpus luteum to continue producing more progesterone and estrogens, rather than to involute as occurs if the ovum is not fertilized. hCS is similar to growth hormone and is crucial for fetal nutrition.
The female breast also matures during pregnancy. Large amounts of estrogen secreted by the placenta trigger growth and branching of the breast milk ductal system while lactation is initiated by the secretion of prolactin by the pituitary gland. Parturition involves several hormonal changes that increase uterine contractility toward the end of pregnancy, as follows. The levels of estrogens increase more than those of progesterone. Oxytocin is secreted by the neurohypophysis. Concomitantly, uterine sensitivity to oxytocin increases. The fetus itself secretes oxytocin, cortisol (from adrenal glands), and prostaglandins.
Menopause occurs when most of the ovarian follicles have degenerated. The ovary then produces less estradiol, reducing the negative feedback on the pituitary and hypothalamus glands. Mean levels of circulating FSH and LH increase, even as ovulatory cycles continue. Therefore, the ovary is less responsive to gonadotropins, and there is an increase in the time between menstrual cycles. Consequently, menstrual bleeding ceases and reproductive capability ends. Cell Differentiation and Proliferation Tissue growth involves complex and ordered patterns of cell proliferation, cell differentiation, and apoptosis. Cell proliferation must be regulated to maintain both the number of cells and their spatial organization. This regulation depends upon the appropriate expression of proteins which control cell cycle progression in response to extracellular signals, such as growth factors and other mitogens, and intracellular cues, such as DNA damage or nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fall into several categories, including growth factors and their receptors, second messenger and signal transduction proteins, oncogene products, tumor- suppressor proteins, and mitosis-promoting factors.
Growth factors were originally described as serum factors required to promote cell proliferation. Most growth factors are large, secreted polypeptides that act on cells in their local environment. Growth factors bind to and activate specific cell surface receptors and initiate mtracellular signal transduction cascades. Many growth factor receptors are classified as receptor tyrosine kinases which undergo autophosphorylation upon ligand binding. Autophosphorylation enables the receptor to interact with signal transduction proteins characterized by the presence of SH2 or SH3 domains (Src homology regions 2 or 3). These proteins then modulate the activity state of small G- proteins, such as Ras, Rab, and Rho, along with GTPase activating proteins (GAPs), guanine nucleotide releasing proteins (GNRPs), and other guanine nucleotide exchange factors. Small G proteins act as molecular switches that activate other downstream events, such as mitogen-activated protein kinase (MAP kinase) cascades. MAP kinases ultimately activate transcription of mitosis- promoting genes. In addition to growth factors, small signaling peptides and hormones also influence cell proliferation. These molecules bind primarily to another class of receptor, the trimeric G-protein coupled receptor (GPCR), found predominantly on the surface of immune, neuronal and neuroendocrine cells. Upon ligand binding, the GPCR activates a trimeric G protein which in turn triggers increased levels of intracellular second messengers such as phospholipase C, Ca2+, and cyclic AMP. Most GPCR-mediated signaling pathways indirectly promote cell proliferation by causing the secretion or breakdown of other signaling molecules that have direct mitogenic effects. These signaling cascades often involve activation of kinaSes and phosphatases. Some growth factors, such as some members of the fransforming growth factor beta (TGF-β) family, act on some cells to stimulate cell proliferation and on other cells to inhibit it. Growth factors may also stimulate a cell at one concentration and inhibit the same cell at another concentration. Most growth factors also have a multitude of other actions besides the regulation of cell growth and division: they can control the proliferation, survival, differentiation, migration, or function of cells depending on the circumstance. For example, the tumor necrosis factor/nerve growth factor (TNF/NGF) family can activate or inhibit cell death, as well as regulate proliferation and differentiation. The cell response depends on the type of cell, its stage of differentiation and transformation status, which surface receptors are stimulated, and the types of stimuli acting on the cell (Smith, A. et al. (1994) Cell 76:959-962; and Nocentini, G. et al. (1997) Proc. Natl. Acad. Sci. USA 94:6216-6221).
Neighboring cells in a tissue compete for growth factors, and when provided with
Figure imgf000020_0001
quantities in a perfused system will grow to even higher cell densities before reaching density- dependent inhibition of cell division. Cells often demonstrate an anchorage dependence of cell division as well. This anchorage dependence may be associated with the formation of focal contacts linking the cytoskeleton with the extracellular matrix (ECM). The expression of ECM components can be stimulated by growth factors. For example, TGF-β stimulates fibroblasts to produce a variety of ECM proteins, including fibronectin, collagen, and tenascin (Pearson, CA. et al. (1988) EMBO J. 7:2677- 2981). In fact, for some cell types specific ECM molecules, such as laminin or fibronectin, may act as growt actors. enasc n- an - , expresse n eve op ng an es one neura ssue, prov e stimulatory/anti-adhesive or inhibitory properties, respectively, for axonal growth (Faissner, A. (1997) Cell Tissue Res. 290:331-341).
Cancers are associated with the activation of oncogenes which are derived from normal cellular genes. These oncogenes encode oncoproteins which convert normal cells into malignant cells. Some oncoproteins are mutant isoforms of the normal protein, and other oncoproteins are abnormally expressed with respect to location or amount of expression. The latter category of oncoprotein causes cancer by altering transcriptional control of cell proliferation. Five classes of oncoproteins are known to affect cell cycle controls. These classes include growth factors, growth factor receptors, intracellular signal transducers, nuclear transcription factors, and cell-cycle control proteins. Viral oncogenes are integrated into the human genome after infection of human cells by certain viruses. Examples of viral oncogenes include v-src, v-abl, and v-fps.
Many oncogenes have been identified and characterized. These include sis, erbA, erbB, her- 2, mutated Gs, src, abl, ras, crk, jun, fos, myc, and mutated tumor-suppressor genes such as RB, p53, mdm2, Cipl, pl6, and cyclin D. Transformation of normal genes to oncogenes may also occur by chromosomal translocation. The Philadelphia chromosome, characteristic of chronic myeloid leukemia and a subset of acute lymphoblastic leukemias, results from a reciprocal translocation between chromosomes 9 and 22 that moves a truncated portion of the proto-oncogene c-abl to the breakpoint cluster region (bcr) on chromosome 22. Tumor-suppressor genes are involved in regulating cell proliferation. Mutations which cause reduced or loss of function in tumor-suppressor genes result in uncontrolled cell proliferation. For example, the retinoblastoma gene product (RB), in a non-phosphorylated state, binds several early- response genes and suppresses their transcription, thus blocking cell division. Phosphorylation of RB causes it to dissociate from the genes, releasing the suppression, and allowing cell division to proceed. Apoptosis
Apoptosis is the genetically controlled process by which unneeded or defective cells undergo programmed cell death. Selective elimination of cells is as important for morphogenesis and tissue remodeling as is cell proliferation and differentiation. Lack of apoptosis may result in hyperplasia and other disorders associated with increased cell proliferation. Apoptosis is also a critical component of the immune response. Immune cells such as cytotoxic T-cells and natural killer cells prevent the spread of disease by inducing apoptosis in tumor cells and virus-infected cells. In addition, immune cells that fail to distinguish self molecules from foreign molecules must be eliminated by apoptosis to avoid an autoimmune response.
Apoptotic cells undergo distinct morphological changes. Hallmarks of apoptosis include cell shrinkage, nuclear and cytoplasmic condensation, and alterations in plasma membrane topology. Biochemically, apoptotic cells are characterized by increased intracellular calcium concentration, fragmentation of chromosomal DNA, and expression of novel cell surface components.
The molecular mechanisms of apoptosis are highly conserved, and many of the key protein regulators and effectors of apoptosis have been identified. Apoptosis generally proceeds in response 5 to a signal which is transduced intracellularly and results in altered patterns of gene expression and protein activity. Signaling molecules such as hormones and cytokines are known both to stimulate and to inhibit apoptosis through interactions with cell surface receptors. Transcription factors also play an important role in the onset of apoptosis. A number of downstream effector molecules, particularly proteases such as the cysteine proteases called caspases, have been implicated in the degradation of 0 cellular components and the proteolytic activation of other apoptotic effectors. Aging and Senescence
Studies of the aging process or senescence have shown a number of characteristic cellular and molecular changes (Fauci et al. (1998) Harrison's Principles of Internal Medicine, McGraw-Hill, New York NY, p.37). These characteristics include increases in chromosome structural 5 abnormalities, DNA cross-linking, incidence of single-stranded breaks in DNA, losses in DNA methylation, and degradation of telomere regions. In addition to these DNA changes, posttranslational alterations of proteins increase including, deamidation, oxidation, cross-linking, and nonenzymatic glycation. Still further molecular changes occur in the mitochondria of aging cells through deterioration of structure. These changes eventually contribute to decreased function in every o organ of the body.
Biochemical Pathway Molecules
Biochemical pathways are responsible for regulating metabolism, growth and development, protein secretion and trafficking, environmental responses, and ecological interactions including 5 immune response and response to parasites. DNA replication
Deoxyribonucleic acid (DNA), the genetic material, is found in both the nucleus and mitochondria of human cells. The bulk of human DNA is nuclear, in the form of linear chromosomes, while mitochondrial DNA is circular. DNA replication begins at specific sites called origins of o replication. Bidirectional synthesis occurs from the origin via two growing forks that move in opposite directions. Replication is semi-conservative, with each daughter duplex containing one old strand and its newly synthesized complementary partner. Proteins involved in DNA replication include DNA polymerases, DNA primase, telomerase, DNA helicase, topoisomerases, DNA ligases, replication factors, and DNA-binding proteins. 5 DNA Recombination and Repair Cells are constantly faced w th repl ca on errors and environmental assault (such as ultraviolet irradiation) that can produce DNA damage. Damage to DNA consists of any change that modifies the structure of the molecule. Changes to DNA can be divided into two general classes, single base changes and structural distortions. Any damage to DNA can produce a mutation, and the mutation may produce a disorder, such as cancer.
Changes in DNA are recognized by repair systems within the cell. These repair systems act to correct the damage and thus prevent any deleterious affects of a mutational event. Repair systems can be divided into three general types, direct repair, excision repair, and retrieval systems. Proteins involved in DNA repair include DNA polymerase, excision repair proteins, excision and cross link repair proteins, recombination and repair proteins, RAD51 proteins, and BLN and WRN proteins that are homologs of RecQ helicase. When the repair systems are eliminated, cells become exceedingly sensitive to environmental mutagens, such as ultraviolet irradiation. Patients with disorders associated with a loss in DNA repair systems often exhibit a high sensitivity to environmental mutagens. Examples of such disorders include xeroderma pigmentosum (XP), Bloom's syndrome (BS), and Werner's syndrome (WS) (Yamagata, K. et al. (1998) Proc. Natl. Acad. Sci. USA 95:8733-8738), ataxia telangiectasia, Cockayne's syndrome, and Fanconi's anemia.
Recombination is the process whereby new DNA sequences are generated by the movements of large pieces of DNA. In homologous recombination, which occurs during meiosis and DNA repair, parent DNA duplexes align at regions of sequence similarity, and new DNA molecules form by the breakage and joining of homologous segments. Proteins involved include RAD51 recombinase. In site-specific recombination, two specific but not necessarily homologous DNA sequences are exchanged. In the immune system this process generates a diverse collection of antibody and T cell receptor genes. Proteins involved in site-specific recombination in the immune system include recombination activating genes 1 and 2 (RAG1 and RAG2). A defect in immune system site-specific recombination causes severe combined immunodeficiency disease in mice.
RNA Metabolism
Ribonucleic acid (RNA) is a linear single-stranded polymer of four nucleotides, ATP, CTP, UTP, and GTP. In most organisms, RNA is transcribed as a copy of DNA, the genetic material of the organism. In retroviruses RNA rather than DNA serves as the genetic material. RNA copies of the genetic material encode proteins or serve various structural, catalytic, or regulatory roles in organisms. RNA is classified according to its cellular localization and function. Messenger RNAs (mRNAs) encode polypeptides. Ribosomal RNAs (rRNAs) are assembled, along with ribosomal proteins, into ribosomes, which are cytoplasmic particles that translate mRNA into polypeptides. Transfer RNAs (tRNAs) are cytosolic adaptor molecules that function in mRNA translation by recognizing both an mRNA codon and the amino acid that matches that codon. Heterogeneous nuclear RNAs (hnRNAs) include mRNA precursors and other nuclear RNAs of various sizes. Small nuclear RNAs (snRNAs) are a part of the nuclear spliceosome complex that removes intervening, non-coding sequences (introns) and rejoins exons in pre-mRNAs. RNA Transcription 5 The transcription process synthesizes an RNA copy of DNA. Proteins involved include multi-subunit RNA polymerases, transcription factors HA, JJB, HD, HE, HF, HH, and HJ. Many transcription factors incorporate DNA-binding structural motifs which comprise either α-helices or β- sheets that bind to the major groove of DNA. Four well-characterized structural motifs are helix-turn- helix, zinc finger, leucine zipper, and helix-loop-helix. 0 RNA Processing
Various proteins are necessary for processing of transcribed RNAs in the nucleus. Pre- mRNA processing steps include capping at the 5' end with methylguanosine, polyadenylating the 3' end, and splicing to remove introns. The spliceosomal complex is comprised of five small nuclear ribonucleoprotein particles (snRNPs) designated UI, U2, U4, U5, and U6. Each snRNP contains a 5 single species of snRNA and about ten proteins. The RNA components of some snRNPs recognize and base-pair with intron consensus sequences. The protein components mediate spliceosome assembly and the splicing reaction. Autoantibodies to snRNP proteins are found in the blood of patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY, p. 863). o Heterogeneous nuclear ribonucleoproteins (hnRNPs) have been identified that have roles in splicing, exporting of the mature RNAs to the cytoplasm, and mRNA translation (Biamonti, G. et al. (1998) Clin. Exp. Rheumatol. 16:317-326). Some examples of hnRNPs include the yeast proteins Hrplp, involved in cleavage and polyadenylation at the 3' end of the RNA; Cbp80p, involved in capping the 5' end of the RNA; and Npl3p, a homolog of mammalian hnRNP Al, involved in export of 5 mRNA from the nucleus (Shen, E.G. et al. (1998) Genes Dev. 12:679-691). HnRNPs have been shown to be important targets of the autoimmune response in rheumatic diseases (Biamonti, supra). Many snRNP proteins, ImRNP proteins, and alternative splicing factors are characterized by an RNA recognition motif (RRM). (Reviewed in Birney, E. et al. (1993) Nucleic Acids Res. 21:5803- 5816.) The RRM is about 80 amino acids in length and forms four β-strands and two α-helices o arranged in an α/β sandwich. The RRM contains a core RNP-1 octapeptide motif along with surrounding conserved sequences. RNA Stability and Degradation
RNA helicases alter and regulate RNA conformation and secondary structure by using energy derived from ATP hydrolysis to destabilize and unwind RNA duplexes. The most well- 5 characterized and ubiquitous family of RNA helicases is the DEAD-box family, so named for the conserved B-type ATP-binding mot w ic s iagnos c o prote ns n t s ami y. ver - box helicases have been identified in organisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants. DEAD-box helicases function in diverse processes such as translation initiation, splicing, ribosome assembly, and RNA editing, transport, and stability. Some DEAD-box helicases play tissue- and stage-specific roles in spermatogenesis and embryogenesis. (Reviewed in Linder, P. et al. (1989) Nature 337:121-122.)
Overexpression of the DEAD-box 1 protein (DDX1) may play a role in the progression of neuroblastoma (Nb) and retinoblastoma (Rb) tumors. Other DEAD-box helicases have been implicated either directly or indirectly in ultraviolet light-induced tumors, B cell lymphoma, and myeloid malignancies. (Reviewed in Godbout, R. et al. (1998) J. Biol. Chem. 273:21161-21168.)
Ribonucleases (RNases) catalyze the hydrolysis of phosphodiester bonds in RNA chains, thus cleaving the RNA. For example, RNase P is a ribonucleoprotein enzyme which cleaves the 5' end of pre-tRNAs as part of their maturation process. RNase H digests the RNA strand of an RNA/DNA hybrid. Such hybrids occur in cells invaded by retroviruses, and RNase H is an important enzyme in the retroviral replication cycle. RNase H domains are often found as a domain associated with reverse transcriptases. RNase activity in serum and cell extracts is elevated in a variety of cancers and infectious diseases (Schein, CH. (1997) Nat. Biotechnol. 15:529-536). Regulation of RNase activity is being investigated as a means to control tumor angiogenesis, allergic reactions, viral infection and replication, and fungal infections. Protein Translation
The eukaryotic ribosome is composed of a 60S (large) subunit and a 40S (small) subunit, which together form the 80S ribosome. In addition to the 18S, 28S, 5S, and 5.8S rRNAs, the ribosome also contains more than fifty proteins. The ribosomal proteins have a prefix which denotes the subunit to which they belong, either L (large) or S (small). Three important sites are identified on the ribosome. The aminoacyl-tRNA site (A site) is where charged tRNAs (with the exception of the initiator-tRNA) bind on arrival at the ribosome. The peptidyl-tRNA site (P site) is where new peptide bonds are formed, as well as where the initiator tRNA binds. The exit site (E site) is where deacylated tRNAs bind prior to their release from the ribosome. (Translation is reviewed in Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY, pp. 875-908; and Lodish, H. et al. (1995) Molecular Cell Biology. Scientific American Books, New York NY, pp. 119-138.) tRNA Charging
Protein biosynthesis depends on each amino acid forming a linkage with the appropriate tRNA. The aminoacyl-tRNA synthetases are responsible for the activation and correct attachment of an amino acid with its cognate tRNA. The 20 aminoacyl-tRNA synthetase enzymes can be divided into two structural classes, Class I and Class H. Autoantibodies against aminoacyl-tRNAs are generated by pa ents w th dermatomyos t s an po ymyos s, and correlate strong y w th complicat ng interstitial lung disease (ILD). These antibodies appear to be generated in response to viral infection, and coxsackie virus has been used to induce experimental viral myositis in animals.
Translation Initiation
5 Initiation of translation can be divided into three stages. The first stage brings an initiator transfer RNA (Met-tRNA{) together with the 40S ribosomal subunit to form the 43S preinitiation complex. The second stage binds the 43 S preinitiation complex to the mRNA, followed by migration of the complex to the correct AUG initiation codon. The third stage brings the 60S ribosomal subunit to the 40S subunit to generate an 80S ribosome at the initiation codon. Regulation of translation 0 primarily involves the first and second stage in the initiation process (Pain, V.M. (1996) Eur. J. Biochem. 236:747-771).
Several initiation factors, many of which contain multiple subunits, are involved in bringing an initiator tRNA and 40S ribosomal subunit together. eIF2, a guanine nucleotide binding protein, recruits the initiator tRNA to the 40S ribosomal subunit. Only when eEF2 is bound to GTP does it associate 5 with the initiator tRNA. eIF2B, a guanine nucleotide exchange protein, is responsible for converting eIF2 from the GDP-bound inactive form to the GTP-bound active form. Two other factors, elFIA and eIF3 bind and stabilize the 40S subunit by interacting with 18S ribosomal RNA and specific ribosomal structural proteins. eIF3 is also involved in association of the 40S ribosomal subunit with mRNA. The Met-tRNAf, elFIA, eIF3, and 40S ribosomal subunit together make up the 43S o preinitiation complex (Pain, supra).
Additional factors are required for binding of the 43 S preinitiation complex to an mRNA molecule, and the process is regulated at several levels. eIF4F is a complex consisting of three proteins: eIF4E, eIF4A, and eIF4G. eIF4E recognizes and binds to the mRNA 5 -terminal m7 GTP cap, eIF4A is a bidirectional RNA-dependent helicase, and eJF4G is a scaffolding polypeptide. eIF4G 5 has three binding domains. The N-terminal third of eIF4G interacts with eIF4E, the central third interacts with eIF4A, and the C-terminal third interacts with eIF3 bound to the 43S preinitiation complex. Thus, eIF4G acts as a bridge between the 40S ribosomal subunit and the mRNA (Hentze, M.W. (1997) Science 275:500-501).
The ability of eIF4F to initiate binding of the 43 S preinitiation complex is regulated by o structural features of the mRNA. The mRNA molecule has an untranslated region (UTR) between the 5' cap and the AUG start codon. In some mRNAs this region forms secondary structures that impede binding of the 43 S preinitiation complex. The helicase activity of eIF4A is thought to function in removing this secondary structure to facilitate binding of the 43S preinitiation complex (Pain, supra). Translation Elongation 5 Elongation is the process whereby additional amino acids are joined to the initiator methionine to form the complete polypeptide chain. The elongation factors EFlα, EFlβ γ, and EF2 are involved in elongating the polypeptide chain following initiation. EFlα is a GTP-binding protein. In EFlα's
GTP-bound form, it brings an aminoacyl-tRNA to the ribosome' s A site. The amino acid attached to the newly arrived aminoacyl-tRNA forms a peptide bond with the initiator methionine. The GTP on 5 EFlα is hydrolyzed to GDP, and EFlα-GDP dissociates from the ribosome. EFlβ γ binds EFlα -
GDP and induces the dissociation of GDP from EFlα, allowing EFlα to bind GTP and a new cycle to begin.
As subsequent aminoacyl-tRNAs are brought to the ribosome, EF-G, another GTP-binding protein, catalyzes the translocation of tRNAs from the A site to the P site and finally to the E site of 0 the ribosome. This allows the processivity of translation.
Translation Termination
The release factor eRF carries out termination of translation. eRF recognizes stop codons in the mRNA, leading to the release of the polypeptide chain from the ribosome.
Post-Translational Pathways 5 Proteins may be modified after translation by the addition of phosphate, sugar, prenyl, fatty acid, and other chemical groups. These modifications are often required for proper protein activity.
Enzymes involved in post-translational modification include kinases, phosphatases, glycosyltransferases, and prenyltransferases. The conformation of proteins may also be modified after translation by the introduction and rearrangement of disulfide bonds (rearrangement catalyzed by o protein disulfide isomerase), the isomerization of proline sidechains by prolyl isomerase, and by interactions with molecular chaperone proteins.
Proteins may also be cleaved by proteases. Such cleavage may result in activation, inactivation, or complete degradation of the protein. Proteases include serine proteases, cysteine proteases, aspartic proteases, and metalloproteases. Signal peptidase in the endoplasmic reticulum 5 (ER) lumen cleaves the signal peptide from membrane or secretory proteins that are imported into the ER. Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of cellular proteins in eukaryotic cells and some bacteria. The UCS mediates the elimination of abnormal proteins and regulates the half-lives of important regulatory proteins that control cellular processes such as gene transcription and cell cycle progression. In the UCS pathway, o proteins targeted for degradation are conjugated to a ubiquitin, a small heat stable protein. Proteins involved in the UCS include ubiquitin-activating enzyme, ubiquitin-conjugating enzymes, ubiquitin- ligases, and ubiquitin C-terminal hydrolases. The ubiquitinated protein is then recognized and degraded by the proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutilization by ubiquitin protease. 5 Lipid Metabolism Lipids are water-insoluble, oily or greasy substances that are soluble in nonpolar solvents such as chloroform or ether. Neutral fats (triacylglycerols) serve as major fuels and energy stores. Polar lipids, such as phosphoHpids, sphingoHpids, glycoHpids, and cholesterol, are key structural components of cell membranes. Lipid metaboHsm is involved in human diseases and disorders. In the arterial disease atherosclerosis, fatty lesions form on the inside of the arterial wall. These lesions promote the loss of arterial flexibility and the formation of blood clots (Guyton, A.C Textbook of Medical Physiology (1991) W.B. Sauήders Company, Philadelphia PA, pp.760-763). In Tay-Sachs disease, the GM2 ganghoside (a sphingoHpid) accumulates in lysosomes of the central nervous system due to a lack of the enzyme N-acetylhexosaminidase. Patients suffer nervous system degeneration leading to early death (Fauci, A.S. et al. (1998) Harrison's Principles of Internal Medicine McGraw-Hill, New York NY, p. 2171). The Niemann-Pick diseases are caused by defects in Hpid metaboHsm. Niemann-Pick diseases types A and B are caused by accumulation of sphingomyelin (a sphingoHpid) and other Hpids in the central nervous system due to a defect in the enzyme sphingomyeHnase, leading to neurodegeneration and lung disease. Niemann-Pick disease type C results from a defect in cholesterol transport, leading to the accumulation of sphingomyelin and cholesterol in lysosomes and a secondary reduction in sphingomyeHnase activity. Neurological symptoms such as grand mal seizures, ataxia, and loss of previously learned speech, manifest 1-2 years after birth. A mutation in the NPC protein, which contains a putative cholesterol-sensing domain, was found in a mouse model of Niemann-Pick disease type C (Fauci, supra, p. 2175; Loftus, S.K. et al. (1997) Science 277:232-235). (Lipid metaboHsm is reviewed in Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY; Lehninger, A. (1982) Principles of Biochemistry Worth PubHshers, Inc., New York NY; and ExPASy "Biochemical Pathways" index of Boehringer Mannheim World Wide Web site.) Fatty Acid Synthesis Fatty acids are long-chain organic acids with a single carboxyl group and a long non-polar hydrocarbon tail. Long-chain fatty acids are essential components of glycoHpids, phosphoHpids, and cholesterol, which are building blocks for biological membranes, and of triglycerides, which are biological fuel molecules. Long-chain fatty acids are also substrates for eicosanoid production, and are important in the functional modification of certain complex carbohydrates and proteins. 16-carbon and 18-carbon fatty acids are the most common.
Fatty acid synthesis occurs in the cytoplasm. In the first step, acetyl-Coenzyme A (CoA) carboxylase (ACC) synthesizes malonyl-CoA from acetyl-CoA and bicarbonate. The enzymes which catalyze the remaining reactions are covalently linked into a single polypeptide chain, referred to as the multifunctional enzyme fatty acid synthase (FAS). FAS catalyzes the synthesis of palmitate from acetyl-CoA and malonyl-CoA. FAS contains acetyl transferase, malonyl transferase, β-ketoacetyl synthase, acyl carrier protein, β-ketoacyl reductase, dehydratase, enoyl reductase, and thioesterase activities. The final product of the FAS reaction is the 16-carbon fatty acid palmitate. Further elongation, as well as unsaturation, of palmitate by accessory enzymes of the ER produces the variety of long chain fatty acids required by the individual cell. These enzymes include a NADH-cytochrome 5 b5 reductase, cytochrome b5, and a desaturase. PhosphoHpid and Triacylglycerol Synthesis
Triacylglycerols, also known as triglycerides and neutral fats, are major energy stores in animals. Triacylglycerols are esters of glycerol with three fatty acid chains. Glycerol-3 -phosphate is produced from dihydroxyacetone phosphate by the enzyme glycerol phosphate dehydrogenase or from 0 glycerol by glycerol kinase. Fatty acid-CoA's are produced from fatty acids by fatty acyl-CoA synthetases. Glyercol-3 -phosphate is acylated with two fatty acyl-CoA's by the enzyme glycerol phosphate acyltransferase to give phosphatidate. Phosphatidate phosphatase converts phosphatidate to diacylglycerol, which is subsequently acylated to a triacylglyercol by the enzyme diglyceride acyltransferase. Phosphatidate phosphatase and diglyceride acyltransferase form a triacylglyerol 5 synthetase complex bound to the ER membrane.
A major class of phosphoHpids are the phosphoglycerides, which are composed of a glycerol backbone, two fatty acid chains, and a phosphorylated alcohol. Phosphoglycerides are components of ceU membranes. Principal phosphoglycerides are phosphatidyl choHne, phosphatidyl ethanolamine, phosphatidyl serine, phosphatidyl inositol, and diphosphatidyl glycerol. Many enzymes involved in 0 phosphoglyceride synthesis are associated with membranes (Meyers, R.A. (1995) Molecular Biology and Biotechnology, VCH PubHshers Inc., New York NY, pp. 494-501). Phosphatidate is converted to CDP-diacylglycerolby the enzyme phosphatidate cytidylyltransferase (ExPASy ENZYME EC 2JJ.41). Transfer of the diacylglycerol group from CDP-diacylglycerol to serine to yield phosphatidyl serine, or to inositol to yield phosphatidyl inositol, is catalyzed by the enzymes CDP- 5 diacylglycerol-serine O-phosphatidyltransferase and CDP-diacylglycerol-inositol 3- phosphatidyltransferase, respectively (ExPASy ENZYME EC 2J.8.8; ExPASy ENZYME EC 2J.8.11). The enzyme phosphatidyl serine decarboxylase catalyzes the conversion of phosphatidyl serine to phosphatidyl ethanolamine, using a pyruvate cofactor (Voelker, D.R. (1997) Biochim. Biophys. Acta 1348:236-244). Phosphatidyl choHne is formed using diet-derived choline by the o reaction of CDP-choHne with 1 ,2-diacylglycerol, catalyzed by diacylglycerol cholinephosphotransferase (ExPASy ENZYME 2J.8.2). Sterol, Steroid, and Isoprenoid MetaboHsm
Cholesterol, composed of four fused hydrocarbon rings with an alcohol at one end, moderates the fluidity of membranes in which it is incorporated. In addition, cholesterol is used in the synthesis of 5 steroid hormones such as cortisol, progesterone, estrogen, and testosterone. Bile salts derived from cholesterol facintate the digestion of Hpids. Cholesterol m the skin forms a barrier that prevents excess water evaporation from the body. Farnesyl and geranylgeranyl groups, which are derived from cholesterol biosynthesis intermediates, are post-translationally added to signal transduction proteins such as ras and protein-targeting proteins such as rab. These modifications are important for the 5 activities of these proteins (Guyton, supra; Stryer, supra, pp. 279-280, 691-702, 934).
Mammals obtain cholesterol derived from both de novo biosynthesis and the diet. The Hver is the major site of cholesterol biosynthesis in mammals. Two acetyl-CoA molecules initially condense to form acetoacetyl-CoA, catalyzed by a thiolase. Acetoacetyl-CoA condenses with a third acetyl- CoA to formhydroxymethylglutaryl-CoA (HMG-CoA), catalyzed by HMG-CoA synthase. o Conversion of HMG-CoA to cholesterol is accompHshed via a series of enzymatic steps known as the mevalonate pathway. The rate-limiting step is the conversion of HMG-CoA to mevalonate by HMG- CoA reductase. The drug lovastatin, a potent inhibitor of HMG-CoA reductase, is given to patients to reduce their serum cholesterol levels. Other mevalonate pathway enzymes include mevalonate kinase, phosphomevalonate kinase, diphosphomevalonate decarboxylase, isopentenyldiphosphate isomerase, 5 dimethylallyl transferase, geranyl transferase, farnesyl-diphosphate farnesyltransferase, squalene monooxygenase, lanosterol synthase, lathosterol oxidase, and 7-dehydrocholesterol reductase. Cholesterol is used in the synthesis of steroid hormones such as cortisol, progesterone, aldosterone, estrogen, and testosterone. First, cholesterol is converted to pregnenolone by cholesterol monooxygenases. The other steroid hormones are synthesized from pregnenolone by a series of o enzyme-catalyzed reactions including oxidations, isomerizations, hydroxylations, reductions, and demethylations. Examples of these enzymes include steroid Δ-isomerase, 3β-hydroxy-Δ5-steroid dehydrogenase, steroid 21 -monooxygenase, steroid 19-hydroxylase, and 3β-hydroxysteroid dehydrogenase. Cholesterol is also the precursor to vitamin D.
Numerous compounds contain 5-carbon isoprene units derived from the mevalonate pathway 5 intermediate isopentenyl pyrophosphate. Isoprenoid groups are found in vitamin K, ubiquinone, retinal, doHchol phosphate (a carrier of oHgosaccharides needed for N-Hnked glycosylation), and farnesyl and geranylgeranyl groups that modify proteins. Enzymes involved include farnesyl transferase, polyprenyl transferases, doHchyl phosphatase, and doHchyl kinase. SphingoHpid MetaboHsm o SphingoHpids are an important class of membrane Hpids that contain sphingosine, a long chain amino alcohol. They are composed of one long-chain fatty acid, one polar head alcohol, and sphingosine or sphingosine derivative. The three classes of SphingoHpids are sphingomyelins, cerebrosides, and gangHosides. Sphingomyelins, which contain phosphochoHne or phosphoethanolamine as their head group, are abundant in the myelin sheath surrounding nerve ceHs. 5 Galactocerebrosides, which contain a glucose or galactose head group, are characteristic of the brain. Other cerebrosides are found in nonneural tissues. GangHosides, whose head groups contain multiple sugar units, are abundant in the brain, but are also found in nonneural tissues.
SphingoHpids are built on a sphingosine backbone. Sphingosine is acylated to ceramide by the enzyme sphingosine acetyltransferase. Ceramide and phosphatidyl choHne are converted to sphingomyelin by the enzyme ceramide choHne phosphotiansferase. Cerebrosides are synthesized by the linkage of glucose or galactose to ceramide by a transferase. Sequential addition of sugar residues to ceramide by transferase enzymes yields gangHosides. Eicosanoid MetaboHsm
Eicosanoids, including prostaglandins, prostacyclin, thromboxanes, and leukotrienes, are 20- carbon molecules derived from fatty acids. Eicosanoids are signaling molecules which have roles in pain, fever, and inflammation. The precursor of all eicosanoids is arachidonate, which is generated from phosphoHpids by phosphoHpase A2 and from diacylglycerols by diacylglycerol Hpase. Leukotrienes are produced from arachidonate by the action of Hpoxygenases. Prostaglandin synthase, reductases, and isomerases are responsible for the synthesis of the prostaglandins. Prostaglandins have roles in inflammation, blood flow, ion transport, synaptic transmission, and sleep. ProstacycHn and the thromboxanes are derived from a precursor prostaglandin by the action of prostacyclin synthase and thromboxane synthases, respectively. Ketone Body MetaboHsm
Pairs of acetyl-CoA molecules derived from fatty acid oxidation in the Hver can condense to form acetoacetyl-CoA, which subsequently forms acetoacetate, D-3-hydroxybutyrate, and acetone. These three products are known as ketone bodies. Enzymes involved in ketone body metaboHsm include HMG-CoA synthetase, HMG-CoA cleavage enzyme, D-3-hydroxybutyrate dehydrogenase, acetoacetate decarboxylase, and 3-ketoacyl-CoA transferase. Ketone bodies are a normal fuel supply , of the heart and renal cortex. Acetoacetate produced by the Hver is transported to cens where the acetoacetate is converted back to acetyl-CoA and enters the citric acid cycle. In times of starvation, ketone bodies produced from stored triacylglyerols become an important fuel source, especially for the brain. Abnormally high levels of ketone bodies are observed in diabetics. Diabetic coma can result if ketone body levels become too great. Lipid Mobilization Within cells, fatty acids are transported by cytoplasmic fatty acid binding proteins (Online
MendeHan Inheritance in Man (OMEVI) * 134650 Fatty Acid-Binding Protein 1, Liver; FABP1). Diazepam binding inhibitor (DBI), also known as endozepine and acyl CoA-binding protein, is an endogenous γ-aminobutyric acid (GABA) receptor Hgand which is thought to down-regulate the effects of GABA. DBI binds medium- and long-chain acyl-CoA esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters (OMDVI * 125950 Diazepam Binding Inhibitor; DBI; PROSITE PDOC00686 Acyl-CoA-binding protein signature).
Fat stored in Hver and adipose triglycerides may be released by hydrolysis and transported in the blood. Free fatty acids are transported in the blood by albumin. Triacylglycerols and cholesterol esters in the blood are transported in Hpoprotein particles. The particles consist of a core of 5 hydrophobic Hpids surrounded by a shell of polar Hpids and apoHpoproteins. The protein components serve in the solubiHzation of hydrophobic Hpids and also contain cell-targeting signals. Lipoproteins include chylomicrons, chylomicron remnants, very-low-density Hpoproteins (VLDL), intermediate- density Hpoproteins (DDL), low-density Hpoproteins (LDL), and high-density Hpoproteins (HDL). There is a strong inverse correlation between the levels of plasma HDL and risk of premature l o coronary heart disease.
Triacylglycerols in chylomicrons and VLDL are hydrolyzed by Hpoprotein Hpases that Hne blood vessels in muscle and other tissues that use fatty acids. Cell surface LDL receptors bind LDL particles which are then internaHzed by endocytosis. Absence of the LDL receptor, the cause of the disease famiHalhypercholesterolemia, leads to increased plasma cholesterol levels and ultimately to
15 atherosclerosis. Plasma cholesteryl ester transfer protein mediates the transfer of cholesteryl esters from HDL to apoHpoprotein B-containing Hpoproteins. Cholesteryl ester transfer protein is important in the reverse cholesterol transport system and may play a role in atherosclerosis (Yamashita, S. et al. (1997) Curr. Opin. Lipidol. 8:101-110). Macrophage scavenger receptors, which bind and internaHze modified Hpoproteins, play a role in Hpid transport and may contribute to atherosclerosis (Greaves,
2 o D.R. et al. (1998) Curr. Opin. Lipidol. 9 :425-432).
Proteins involved in cholesterol uptake and biosynthesis are tightly regulated in response to ceUular cholesterol levels. The sterol regulatory element binding protein (SREBP) is a sterol- responsive transcription factor. Under normal cholesterol conditions, SREBP resides in the ER membrane. When cholesterol levels are low, a regulated cleavage of SREBP occurs which releases
25 the extracellular domain of the protein. This cleaved domain is then transported to the nucleus where it activates the transcription of the LDL receptor gene, and genes encoding enzymes of cholesterol synthesis, by binding the sterol regulatory element (SRE) upstream of the genes (Yang, J. et al. (1995) J. Biol. Chem. 270:12152-12161). Regulation of cholesterol uptake and biosynthesis also occurs via the oxysterol-binding protein (OSBP). OSBP is a Mgh-affinity intracellular receptor for a variety of
3 o oxysterols that down-regulate cholesterol synthesis and stimulate cholesterol esterification (Lagace, T.A. et al. (1997) Biochem. J. 326:205-213). Beta-oxidation
Mitochondrial and peroxisomal beta-oxidation enzymes degrade saturated and unsaturated fatty acids by sequential removal of two-carbon units from CoA-activated fatty acids. The main beta-
35 oxidation pathway degrades both saturated and unsaturated fatty acids while the auxiliary pathway performs additional steps required for the degradation of unsaturated fatty acids.
The pathways of mitochondrial and peroxisomal beta-oxidation use similar enzymes, but have different substrate specificities and functions. Mitochondria oxidize short-, medium-, and long-chain fatty acids to produce energy for cells. Mitochondrial beta-oxidation is a major energy source for cardiac and skeletal muscle. In Hver, it provides ketone bodies to the peripheral circulation when glucose levels are low as in starvation, endurance exercise, and diabetes (Eaton, S. et al. (1996) Biochem. J. 320:345-357). Peroxisomes oxidize medium-, long-, and very-long-chain fatty acids, dicarboxyHc fatty acids, branched fatty acids, prostaglandins, xenobiotics, and bile acid intermediates. The chief roles of peroxisomal beta-oxidation are to shorten toxic HpophiHc carboxyHc acids to faciHtate their excretion and to shorten very-long-chain fatty acids prior to mitochondrial beta-oxidation (Mannaerts, G.P. and P.P. van Veldhoven (1993) Biochimie 75:147-158).
Enzymes involved in beta-oxidation include acyl CoA synthetase, carnitine acyltransferase, acyl CoA dehydrogenases, enoyl CoA hydratases, L-3-hydroxyacyl CoA dehydrogenase, β- ketothiolase, 2,4-dienoyl CoA reductase, and isomerase. Lipid Cleavage and Degradation
Triglycerides are hydrolyzed to fatty acids and glycerol by Hpases. LysophosphoHpases (LPLs) are widely distributed enzymes that metaboHze intracellular Hpids, and occur in numerous isoforms. Small isoforms, approximately 15-30 kD, function as hydrolases; large isoforms, those exceeding 60 kD, function both as hydrolases and transacylases. A particular substrate for LPLs, lysophosphatidylcholine, causes lysis of ceH membranes when it is formed or imported into a cell. LPLs are regulated by Hpid factors including acylcarnitine, arachidonic acid, and phosphatidic acid. These Hpid factors are signaling molecules important in numerous pathways, including the inflammatory response. (Anderson, R. et al. (1994) Toxicol. Appl. Pharmacol. 125:176-183; Selle, H et al. (1993); Eur. J. Biochem. 212:411-416.) The secretory phosphoHpase A2 (PLA2) superfamily comprises a number of heterogeneous enzymes whose common feature is to hydrolyze the sn-2 fatty acid acyl ester bond of phosphoglycerides. Hydrolysis of the glycerophosphoHpids releases free fatty acids and lysophosphoHpids. PLA2 activity generates precursors for the biosynthesis of biologically active Hpids, hydroxy fatty acids, and platelet-activating factor. PLA2 hydrolysis of the sn-2 ester bond in phosphoHpids generates free fatty acids, such as arachidonic acid and lysophosphoHpids. Carbon and Carbohydrate MetaboHsm
Carbohydrates, including sugars or saccharides, starch, and cellulose, are aldehyde or ketone compounds with multiple hydroxyl groups. The importance of carbohydrate metaboHsm is demonstrated by the sensitive regulatory system in place for maintenance of blood glucose levels. Two pancreatic hormones, insulin and glucagon, promote increased glucose uptake and storage by cells, and increased glucose release from cells, respectively. Carbohydrates have three important roles in mammaHan ceUs. First, carbohydrates are used as energy stores, fuels, and metaboHc intermediates. Carbohydrates are broken down to form energy in glycolysis and are stored as glycogen for later use. Second, the sugars deoxyribose and ribose form part of the structural support of DNA and RNA, respectively. Third, carbohydrate modifications are added to secreted and membrane proteins and Hpids as they traverse the secretory pathway. Cell surface carbohydrate- containing macromolecules, including glycoproteins, glycoHpids, and transmembrane proteoglycans, mediate adhesion with other cells and with components of the extracellular matrix. The extracellular matrix is comprised of diverse glycoproteins, glycosaminoglycans (GAGs), and carbohydrate-binding proteins which are secreted from the cell and assembled into an organized meshwork in close association with the ceH surface. The interaction of the ceH with the surrounding matrix profoundly influences cell shape, strength, flexibility, motiHty, and adhesion. These dynamic properties are intimately associated with signal transduction pathways controlling cell proHferation and differentiation, tissue construction, and embryonic development. 5 Carbohydrate metaboHsm is altered in several disorders including diabetes melHtus, hyperglycemia, hypoglycemia, galactosemia, galactokinase deficiency, and UDP-galactose-4- epimerase deficiency (Fauci, A.S. et al. (1998) Harrison's Principles of Internal Medicine, McGraw- Hill, New York NY, pp. 2208-2209). Altered carbohydrate metaboHsm is associated with cancer. Reduced GAG and proteoglycan expression is associated with human lung carcinomas (Nackaerts, K. 0 et al. (1997) Int. J. Cancer 74:335-345). The carbohydrate determinants sialyl Lewis A and sialyl Lewis X are frequently expressed on human cancer cells (Kannagi, R. (1997) Glycoconj. J. 14:577- 584). Alterations of the N-Hnked carbohydrate core structure of cell surface glycoproteins are linked to colon and pancreatic cancers (Schwarz, RE. et al. (1996) Cancer Lett. 107:285-291). Reduced expression of the Sda blood group carbohydrate structure in cell surface glycoHpids and glycoproteins 5 is observed in gastrointestinal cancer (Dohi, T. et al. (1996) Int. J. Cancer 67:626-663). (Carbon and carbohydrate metaboHsm is reviewed in Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY; Lehninger, A.L. (1982) Principles of Biochemistry Worth PubHshers Inc., New York NY; and Lodish, H. et al. (1995) Molecular CeH Biology Scientific American Books, New York NY.) Glycolysis o Enzymes of the glycolytic pathway convert the sugar glucose to pyruvate while simultaneously producing ATP. The pathway also provides building blocks for the synthesis of ceUular components such as long-chain fatty acids. After glycolysis, pyrvuate is converted to acetyl-Coenzyme A, which, in aerobic organisms, enters the citric acid cycle. Glycolytic enzymes include hexokinase, phosphoglucose isomerase, phosphofructokinase, aldolase, triose phosphate isomerase, glyceraldehyde 5 3 -phosphate dehydrogenase, phosphoglycerate kinase, phosphoglyceromutase, enolase, and pyruvate kinase. Of these, phosphofructokinase, hexokinase, and pyruvate kinase are important in regulating the rate of glycolysis. Gluconeo genesis
Gluconeogenesis is the synthesis of glucose from noncarbohydrate precursors such as lactate and amino acids. The pathway, wliich functions mainly in times of starvation and intense exercise, occurs mostly in the Hver and kidney. Responsible enzymes include pyruvate carboxylase, phosphoenolpyruvate carboxykinase, fructose 1,6-bisphosphatase, and glucose-6-phosphatase. Pentose Phosphate Pathway
Pentose phosphate pathway enzymes are responsible for generating the reducing agent NADPH, while at the same time oxidizing glucose-6-phosphate to ribose-5-phosphate. Ribose-5- phosphate and its derivatives become part of important biological molecules such as ATP, Coenzyme A, NAD+, FAD, RNA, and DNA. The pentose phosphate pathway has both oxidative and non- oxidative branches. The oxidative branch steps, which are catalyzed by the enzymes glucose-6- phosphate dehydrogenase, lactonase, and 6-phosphogluconate dehydrogenase, convert glucose-6- phosphate and NADP÷ to ribulose-6-phosphate and NADPH. The non-oxidative branch steps, which are catalyzed by the enzymes phosphopentose isomerase, phosphopentose epimerase, transketolase, and transaldolase, allow the interconversion of three-, four-, five-, six-, and seven-carbon sugars. Glucouronate MetaboHsm
Glucuronate is a monosacchari.de which, in the form of D-glucuronic acid, is found in the GAGs chondroitin and dermatan. D-glucuronic acid is also important in the detoxification and excretion of foreign organic compounds such as phenol. Enzymes involved in glucuronate metaboHsm include UDP-glucose dehydrogenase and glucuronate reductase. Disaccharide MetaboHsm
Disaccharides must be hydrolyzed to monosaccharides to be digested. Lactose, a disaccharide found in milk, is hydrolyzed to galactose and glucose by the enzyme lactase. Maltose is derived from plant starch and is hydrolyzed to glucose by the enzyme maltase. Sucrose is derived from plants and is hydrolyzed to glucose and fructose by the enzyme sucrase. Trehalose, a disaccharide found mainly in insects and mushrooms, is hydrolyzed to glucose by the enzyme trehalase (OMIM *275360 Trehalase; Ruf, J. et al. (1990) J. Biol. Chem. 265:15034-15039). Lactase, maltase, sucrase, and trehalase are bound to mucosal cells lining the smaU intestine, where they participate in the digestion of dietary disaccharides. The enzyme lactose synthetase, composed of the catalytic subunit galactosyltransferase and the modifier subunit α-lactalbumin, converts UDP-galactose and glucose to lactose in the mammary glands. Glycogen, Starch, and Chitin MetaboHsm Glycogen is the storage form of carbohydrates in mammals. Mobilization of glycogen maintains glucose levels between meals and during muscular activity. Glycogen is stored mainly m the
Hver and in skeletal muscle in the form of cytoplasmic granules. These granules contain enzymes that catalyze the synthesis and degradation of glycogen, as weH as enzymes that regulate these processes. Enzymes that catalyze the degradation of glycogen include glycogen phosphorylase, a tiansferase, α- 1,6-glucosidase, and phosphoglucomutase. Enzymes that catalyze the synthesis of glycogen include UDP-glucose pyrophosphorylase, glycogen synthetase, a branching enzyme, and nucleoside diphosphokinase. The enzymes of glycogen synthesis and degradation are tightly regulated by the hormones insulin, glucagon, and epinephrine. Starch, a plant-derived polysaccharide, is hydrolyzed to maltose, maltotriose, and α-dextrinby α-amylase, an enzyme secreted by the saHvary glands and pancreas. Chitin is a polysaccharide found in insects and Crustacea. A chitotriosidase is secreted by macrophages and may play a role in the degradation of cHtin-containing pathogens (Boot, R.G. et al. (1995) J. Biol. Chem. 270:26252-26256). Peptidoglycans and Glycosaminoglvcans
Glycosaminoglycans (GAGs) are anionic linear unbranched polysaccharides composed of repetitive disaccharide units. These repetitive units contain a derivative of an amino sugar, either glucosamine or galactosamine. GAGs exist free or as part of proteoglycans, large molecules composed of a core protein attached to one or more GAGs. GAGs are found on the ceH surface, inside ceHs, and in the extracellular matrix. Changes in GAG levels are associated with several autoimmune diseases including autoimmune thyroid disease, autoimmune diabetes melHtus, and systemic lupus erythematosus (Hansen, C. et al. (1996) CHn. Exp. Rheum. 14 (Suppl. 15):S59-S67). GAGs include chondroitin sulfate, keratan sulfate, heparin, heparan sulfate, dermatan sulfate, and hyaluronan.
The GAG hyaluronan (HA) is found in the extracenular matrix of many ceHs, especially in soft connective tissues, and is abundant in synovial fluid (PitsilHdes, A.A. et al. (1993) Int. J. Exp. Pathol. 74:27-34). HA seems to play important roles in cell regulation, development, and differentiation (Laurent, T.C and J.R. Fraser (1992) FASEB J. 6:2397-2404). Hyaluronidase is an enzyme that degrades HA to oHgosaccharides. Hyaluronidases may function in cell adhesion, infection, angiogenesis, signal transduction, reproduction, cancer, and inflammation.
Proteoglycans, also known as peptidoglycans, are found in the extracellular matrix of connective tissues such as cartilage and are essential for distributing the load in weight-bearing joints. Cell-surface-attached proteoglycans anchor ceHs to the extracellular matrix. Both extracellular and cell-surface proteoglycans bind growth factors, facilitating their binding to cell-surface receptors and subsequent triggering of signal transduction pathways. Amino Acid and Nitrogen MetaboHsm NH is assimilated into amino acids by the actions of two enzymes, glutamate dehydrogenase and glutamine synthetase. The carbon skeletons of amino acids come from the intermediates of glycolysis, the pentose phosphate pathway, or the citric acid cycle. Of the twenty amino acids used in proteins, humans can synthesize only thirteen (nonessential amino acids). The remaining nine must come from the diet (essential amino acids). Enzymes involved in nonessential amino acid biosynthesis include glutamate kinase dehydrogenase, pyrroline carboxylate reductase, asparagine synthetase, phenylalanine oxygenase, methionine adenosyltransferase, adenosylhomocysteinase, cystathionine β- synthase, cystathionine γ-lyase, phosphoglycerate dehydrogenase, phosphoserine transaminase, phosphoserine phosphatase, serine hydroxyknethyltransferase, and glycine synthase.
MetaboHsm of amino acids takes place almost entirely in the Hver, where the amino group is removed by aminotransferases (transaminases), for example, alanine aminotransferase. The amino group is transferred to α-ketoglutarate to form glutamate. Glutamate dehydrogenase converts glutamate to NH4 "1" and α-ketoglutarate. NrJ is converted to urea by the urea cycle which is catalyzed by the enzymes arginase, ornithine transcarbamoylase, arginosuccinate synthetase, and arginosuccinase. Carbamoyl phosphate synthetase is also involved in urea formation. Enzymes involved in the metaboHsm of the carbon skeleton of amino acids include serine dehydratase, asparaginase, glutaminase, propionyl CoA carboxylase, methylmalonyl CoA mutase, branched-chain α-keto dehydrogenase complex, isovaleryl CoA dehydrogenase, β-methylcrotonyl CoA carboxylase, phenylalanine hydroxylase, p-hydroxylphenylpyruvate hydroxylase, and homogentisate oxidase.
Polyamines, which include spermidine, putrescine, and spermine, bind tightly to nucleic acids and are abundant in rapidly proHferating ceHs. Enzymes involved in polyamine synthesis include onώhine decarboxylase.
Diseases involved in amino acid and nitrogen metaboHsm include hyperammonemia, carbamoyl phosphate synthetase deficiency, urea cycle enzyme deficiencies, methylmalonic aciduria, maple syrup disease, alcaptonuria, and phenylketonuria. Energy MetaboHsm
CeHs derive energy from metaboHsm of ingested compounds that maybe roughly categorized as carbohydrates, fats, or proteins. Energy is also stored in polymers such as triglycerides (fats) and glycogen (carbohydrates). MetaboHsm proceeds along separate reaction pathways connected by key intermediates such as acetyl coenzyme A (acetyl-CoA). MetaboHc pathways feature anaerobic and aerobic degradation, coupled with the energy-requiring reactions such as phosphorylation of adenosine diphosphate (ADP) to the triphosphate (ATP) or analogous phosphorylations of guanosine (GDP/GTP), uridine (UDP/UTP), or cytidine (CDP/CTP). Subsequent dephosphorylation of the triphosphate drives reactions needed for ceH maintenance, growth, and proHferation.
Digestive enzymes convert carbohydrates and sugars to glucose; fructose and galactose are converted in the Hver to glucose. Enzymes involved in these conversions include galactose-1 - phosphate uridyl transferase and UDP-galactose-4 epimerase. In the cytoplasm, glycolysis converts glucose to pyruvate in a series of reactions coupled to ATP synthesis.
Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydroHpoyl transacetylase, and dihydroHpoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccinylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase. Acetyl CoA is oxidized to C02 with concomitant formation of NADH, FADH^, and GTP. In oxidative phosphorylation, the transport of electrons from NADH and FADH2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and Pj by the FOF-L ATPase complex in the mitochondrial inner membrane. Enzyme complexes responsible for electron transport and ATP synthesis include the FoF,^ ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c1? FeS protein, and cytochrome c oxidase.
Triglycerides are hydrolyzed to fatty acids and glycerol by Hpases. Glycerol is then phosphorylated to glycerol-3 -phosphate by glycerol kinase and glycerol phosphate dehydrogenase, and degraded by the glycolysis. Fatty acids are transported into the mitochondria as fatty acyl-carnitine esters and undergo oxidative degradation.
In addition to metaboHc disorders such as diabetes and obesity, disorders of energy metaboHsm are associated with cancers (Dorward, A. et al. (1997) J. Bioenerg. Biomembr. 29:385- 392), autism (Lombard, J. (1998) Med. Hypotheses 50:497-500), neurodegenerative disorders (Alexi, T. et al. (1998) Neuroreport 9:R57-64), and neuromuscular disorders (DiMauro, S. et al. (1998) Biochim. Biophys. Acta 1366:199-210). The myocardium is heavily dependent on oxidative metaboHsm, so metaboHc dysfunction often leads to heart disease (DiMauro, S. and M. Hirano (1998) Curr. Opin. Cardiol. 13:190-197). For a review of energy metaboHsm enzymes and intermediates, see Stryer, L. et al. (1995)
Biochemistry, W.H. Freeman and Co., San Francisco CA, pp. 443-652. For a review of energy metaboHsm regulation, see Lodish, H. et al. (1995) Molecular CeH Biology, Scientific American Books, New York NY, pp. 744-770. Cofactor MetaboHsm Cofactors, including coenzymes and prosthetic groups, are smaH molecular weight inorganic or organic compounds that are required for the action of an enzyme. Many cofactors contain vitamins as a component. Cofactors include thiamine pyrophosphate, flavin adenine dinucleotide, flavin mononucleotide, nicotinamide adenine dinucleotide, pyridoxal phosphate, coenzyme A, tetrahydrofolate, Hpoamide, and heme. The vitamins biotin and cobalamin are associated with enzymes as weH. Heme, a prosthetic group found in myoglobin and hemoglobin, consists of protoporphyrin group bound to iron. Porphyrin groups contain four substituted pyrroles covalently joined in a ring, often with a bound metal atom. Enzymes involved in porphyrin synthesis include δ- aminolevuHnate synthase, δ-aminolevuHnate dehydrase, porphobilinogen deaminase, and cosynthase. Deficiencies in heme formation cause porphyrias. Heme is broken down as a part of erythrocyte 5 turnover. Enzymes involved in heme degradation include heme oxygenase and biHverdin reductase. Iron is a required cofactor for many enzymes. Besides the heme-containing enzymes, iron is found in iron-sulfur clusters in proteins including aconitase, succinate dehydrogenase, and NADH-Q reductase. Iron is transported in the blood by the protein transferrin. Binding of transferrin to the transferrin receptor on cell surfaces aUows uptake by receptor mediated endocytosis. CytosoHc iron is 0 bound to ferritin protein.
A molybdenum-containing cofactor (molybdopterin) is found in enzymes including sulfite oxidase, xanthine dehydrogenase, and aldehyde oxidase. Molybdopterin biosynthesis is performed by two molybdenum cofactor synthesizing enzymes. Deficiencies in these enzymes cause mental retardation and lens dislocation. Other diseases caused by defects in cofactor metaboHsm include 5 pernicious anemia and methylmalonic aciduria. Secretion and Trafficking
Eukaryotic cells are bound by a Hpid bilayer membrane and subdivided into functionally distinct, membrane bound compartments. The membranes maintain the essential differences between the cytosol, the extracenular environment, and the lumenal space of each intraceUular organelle. As o Hpid membranes are highly impermeable to most polar molecules, transport of essential nutrients, metaboHc waste products, cell signaling molecules, macromolecules and proteins across Hpid membranes and between organeUes must be mediated by a variety of transport-associated molecules. Protein Trafficking
In eukaryotes, some proteins are synthesized on ER-bound ribosomes, co-translationally 5 imported into the ER, deHvered from the ER to the Golgi complex for post-translational processing and sorting, and transported from the Golgi to specific intraceUular and extracenular destinations. All ceHs possess a constitutive transport process which maintains homeostasis between the ceH and its environment. In many differentiated ceH types, the basic machinery is modified to carry out specific transport functions. For example, in endocrine glands, hormones and other secreted proteins are o packaged into secretory granules for regulated exocytosis to the ceH exterior. In macrophage, foreign extracenular material is engulfed (phagocytosis) and deHvered to lysosomes for degradation. In fat and muscle cells, glucose transporters are stored in vesicles which fuse with the plasma membrane only in response to insulin stimulation. The Secretory Pathway 5 Synthesis of most integral membrane proteins, secreted proteins, and proteins destined for the lumen of a particular organeUe occurs on ER-bound ribosomes. These proteins are co-translationany imported into the ER. The proteins leave the ER via membrane-bound vesicles which bud off the ER at specific sites and fuse with each other (homotypic fusion) to form the ER-Golgi Intermediate Compartment (ERGIC). The ERGIC matures progressively through the cis, medial, and trans 5 cisternal stacks of the Golgi, modifying the enzyme composition by retrograde transport of specific
Golgi enzymes. In this way, proteins moving through the Golgi undergo post-translational modification, such as glycosylation. The final Golgi compartment is the Trans-Golgi Network (TGN), where both membrane and lumenal proteins are sorted for their final destination. Transport vesicles destined for intraceHular compartments, such as the lysosome, bud off the TGN. What remains is a secretory 0 vesicle which contains proteins destined for the plasma membrane, such as receptors, adhesion molecules, and ion channels, and secretory proteins, such as hormones, neurotransmitters, and digestive enzymes. Secretory vesicles eventuaHy fuse with the plasma membrane (GHck, B.S. and V. Malhotra (1998) CeH 95:883-889).
The secretory process can be constitutive or regulated. Most ceHs have a constitutive 5 pathway for secretion, whereby vesicles derived from maturation of the TGN require no specific signal to fuse with the plasma membrane. In many ceHs, such as endocrine ceHs, digestive ceHs, and neurons, vesicle pools derived from the TGN coUect in the cytoplasm and do not fuse with the plasma membrane until they are directed to by a specific signal. Endocytosis o Endocytosis, wherein ceHs internaHze material from the extraceHular environment, is essential for transmission of neuronal, metaboHc, and proHferative signals; uptake of many essential nutrients; and defense against invading organisms. Most ceHs exhibit two forms of endocytosis. The first, phagocytosis, is an actin-driven process exempHfied in macrophage and neutrophils. Material to be endocytosed contacts numerous ceH surface receptors which stimulate the plasma membrane to 5 extend and surround the particle, enclosing it in a membrane-bound phagosome. In the mammaHan immune system, IgG-coated particles bind Fc receptors on the surface of phagocytic leukocytes. Activation of the Fc receptors initiates a signal cascade involving src-family cytosoHc kinases and the monomeric GTP-binding (G) protein Rho. The resulting actin reorganization leads to phagocytosis of the particle. This process is an important component of the humoral immune response, aUowing the o processing and presentation of bacterial-derived peptides to antigen-specific T-lymphocytes.
The second form of endocytosis, pinocytosis, is a more generaHzed uptake of material from the external miHeu. Like phagocytosis, pinocytosis is activated by Hgand binding to ceH surface receptors. Activation of individual receptors stimulates an internal response that includes coalescence of the receptor-Hgand complexes and formation of clathrin-coated pits. Invagination of the plasma 5 membrane at clathrin-coated pits produces an endocytic vesicle within the ceH cytoplasm. These vesicles undergo homotypic fusion to form an early endosomal (EE) compartment. The tubulovesicular EE serves as a sorting site for incoming material. ATP-driven proton pumps in the EE membrane lowers the pH of the EE lumen (pH 6.3-6.8). The acidic environment causes many Hgands to dissociate from their receptors. The receptors, along with membrane and other integral membrane proteins, are recycled back to the plasma membrane by budding off the tubular extensions of the EE in recycling vesicles (RV). This selective removal of recycled components produces a carrier vesicle containing Hgand and other material from the external environment. The carrier vesicle fuses with TGN-derived vesicles which contain hydrolytic enzymes. The acidic environment of the resulting late endosome (LE) activates the hydrolytic enzymes which degrade the Hgands and other material. As digestion takes place, the LE fuses with the lysosome where digestion is completed (MeUman, I. (1996) Annu. Rev. CeU Dev. Biol. 12:575-625).
Recycling vesicles may return directly to the plasma membrane. Receptors internaHzed and returned directly to the plasma membrane have a turnover rate of 2-3 minutes. Some RVs undergo microtubule-directed relocation to a perinuclear site, from which they then return to the plasma membrane. Receptors foUowmg tins route have a turnover rate of 5-10 minutes. StiU other RVs are retained within the ceH until an appropriate signal is received (MeUman, supra; and James, D.E. et al. (1994) Trends CeH Biol. 4:120-126). Vesicle Formation
Several steps in the transit of material along the secretory and endocytic pathways require the formation of transport vesicles. SpecificaUy, vesicles form at the transitional endoplasmic reticulum (tER), the rim of Golgi cisternae, the face of the Trans-Golgi Network (TGN), the plasma membrane (PM), and tubular extensions of the endosomes. The process begins with the budding of a vesicle out of the donor membrane. The membrane-bound vesicle contains proteins to be transported and is surrounded by a protective coat made up of protein subunits recruited from the cytosol. The initial budding and coating processes are controUed by a cytosoHc ras-like GTP-binding protein, ADP- ribosylating factor (Arf), and adapter proteins (AP). Different isoforms of both Arf and AP are involved at different sites of budding. Another smaU G-protein, dynamin, forms a ring complex around the neck of the forming vesicle and may provide the mechanochemical force to accompHsh the final step of the budding process. The coated vesicle complex is then transported through the cytosol. During the transport process, Arf-bound GTP is hydrolyzed to GDP and the coat dissociates from the transport vesicle (West, M.A. et al. (1997) J. CeU Biol. 138:1239-1254). Two different classes of coat protein have also been identified. Clathrin coats form on the TGN and PM surfaces, whereas coatomer or COP coats form on the ER and Golgi. COP coats can further be distinguished as COPI, involved in retrograde traffic through the Golgi and from the Golgi to the ER, and COPH, involved in anterograde traffic from the ER to the Golgi (Mellman, supra). The COP coat consists of two major components, a -protein (Arf or Sar) and coat protomer (coatomer). Coatomer s an equimolar complex of seven proteins, termed alpha-, beta-, beta'-, gamma-, delta-, epsilon- and zeta-COP. (Harter, C and F.T. Wieland (1998) Proc. Natl. Acad. Sci. USA 95:11649-11654.) Membrane Fusion 5 Transport vesicles undergo homotypic or heterotypic fusion in the secretory and endocytotic pathways. Molecules required for appropriate targeting and fusion of vesicles with their target membrane include proteins incorporated in the vesicle membrane, the target membrane, and proteins recruited from the cytosol. During budding of the vesicle from the donor compartment, an integral membrane protein, VAMP (vesicle-associated membrane protein) is incorporated into the vesicle. 0 Soon after the vesicle uncoats, a cytosoHc prenylated GTP-binding protein, Rab (a member of the Ras superfamily), is inserted into the vesicle membrane. GTP-bound Rab proteins are directed into nascent transport vesicles where they interact with VAMP. FoUowing vesicle transport, GTPase activating proteins (GAPs) in the target membrane convert Rab proteins to the GDP-bound form. A cytosoHc protein, guanine-nucleotide dissociation inhibitor (GDI) helps return GDP-bound Rab proteins 5 to their membrane of origin. Several Rab isoforms have been identified and appear to associate with specific compartments within the ceU. Rab proteins appear to play a role in mediating the function of a viral gene, Rev, which is essential for repHcation of HIN- 1, the virus responsible for AIDS (FlaveH, RA. et al. (1996) Proc. Νatl. Acad. Sci. USA 93:4421-4424).
Docking of the transport vesicle with the target membrane involves the formation of a o complex between the vesicle SNAP receptor (v-SNARE), target membrane (t-) SNAREs, and certain other membrane and cytosoHc proteins. Many of these other proteins have been identified although their exact functions in the docking complex remain uncertain (TeUam, J.T. et al. (1995) J. Biol. Chem. 270:5857-5863; and Hata, Y. and T.C Sudhof (1995) J. Biol. Chem. 270:13022-13028). N-ethylmaleimide sensitive factor (NSF) and soluble NSF-attachment protein (α-SNAP and β-SNAP) 5 are two such proteins that are conserved from yeast to man and function in most intraceUular membrane fusion reactions. Seel represents a family of yeast proteins that function at many different stages in the secretory pathway including membrane fusion. Recently, mammaHan homologs of Seel, caUed Munc-18 proteins, have been identified (Katagiri, H. et al. (1995) J. Biol. Chem. 270:4963-4966; Hata et al. supra). o The SNARE complex involves three SNARE molecules, one in the vesicular membrane and two in the target membrane. Synaptotagmin is an integral membrane protein in the synaptic vesicle which associates with the t-SNARE syntaxin in the docking complex. Synaptotagmin binds calcium in a complex with negatively charged phosphoHpids, which aUows the cytosoHc SNAP protein to displace synaptotagmin from syntaxin and fusion to occur. Thus, synaptotagmin is a negative regulator of 5 fusion in the neuron (Littleton, J.T. et al. (1993) CeU 74:1125-1134). The most abundant membrane protein of synaptic vesicles appears to be the glycoprotein synaptophysin, a 38 kDa protein with four transmembrane domains.
Specificity between a vesicle and its target is derived from the v-SNARE, t-SNAREs, and associated proteins involved. Different isoforms of SNAREs and Rabs show distinct ceUular and subceUular distributions. VAMP-1/synaptobrevin, membrane-anchored synaptosome-associated protein of 25 kDa (SNAP-25), syntaxin-1, Rab3A, Rabl5, and Rab23 are predominantly expressed in the brain and nervous system. Different syntaxin, VAMP, and Rab proteins are associated with distinct subceUular compartments and their vesicular carriers. Nuclear Transport Transport of proteins and RNA between the nucleus and the cytoplasm occurs through nuclear pore complexes (NPCs). NPC-mediated transport occurs in both directions through the nuclear envelope. AU nuclear proteins are imported from the cytoplasm, their site of synthesis. tRNA and mRNA are exported from the nucleus, their site of synthesis, to the cytoplasm, their site of function. Processing of smaU nuclear RNAs involves export into the cytoplasm, assembly with proteins and modifications such as hypermethylation to produce smaU nuclear ribonuclear proteins (snRNPs), and subsequent import of the snRNPs back into the nucleus. The assembly of ribosomes requires the initial import of ribosomal proteins from the cytoplasm, their incorporation with RNA into ribosomal subunits, and export back to the cytoplasm. (GδrHch, D. and I.W. Mattaj (1996) Science 271:1513-1518.) The transport of proteins and mRNAs across the NPC is selective, dependent on nuclear locaHzation signals, and generaUy requires association with nuclear transport factors. Nuclear locaHzation signals (NLS) consist of short stretches of amino acids enriched in basic residues. NLS are found on proteins that are targeted to the nucleus, such as the glucocorticoid receptor. The NLS is recognized by the NLS receptor, importin, which then interacts with the monomeric GTP-binding protein Ran. This NLS protein/receptor/Ran complex navigates the nuclear pore with the help of the homodimeric protein nuclear transport factor 2 (NTF2). NTF2 binds the GDP-bound form of Ran and to multiple proteins of the nuclear pore complex containing FXFG repeat motifs, such as p62. (Paschal, B. et al. (1997) J. Biol. Chem. 272:21534-21539; and Wong, D.H. et al. (1997) Mol. CeU Biol. 17:3755-3767). Some proteins are dissociated before nuclear mRNAs are transported across the NPC while others are dissociated shortly after nuclear mRNA transport across the NPC and are reimported into the nucleus. Disease Correlation
The etiology of numerous human diseases and disorders can be attributed to defects in the transport or secretion of proteins. For example, abnormal hormonal secretion is linked to disorders such as diabetes insipidus (vasopressin), hyper- and hypoglycemia (insulin, glucagon), Grave's disease and goiter (thyroid hormone), and Cushing's and Addison's diseases (adrenocorticotropic hormone,
ACTH). Moreover, cancer ceUs secrete excessive amounts of hormones or other biologicaHy active peptides. Disorders related to excessive secretion of biologicaHy active peptides by tumor ceUs include fasting hypoglycemia due to increased insulin secretion from insuHnoma-islet ceU tumors; hypertension due to increased epinephrine and norepinephrine secreted from pheochromocytomas of the adrenal meduUa and sympathetic paragangHa; and carcinoid syndrome, which is characterized by abdominal cramps, diarrhea, and valvular heart disease caused by excessive amounts of vasoactive substances such as serotonin, bradykinin, histamine, prostaglandins, and polypeptide hormones, secreted from intestinal tumors. BiologicaHy active peptides that are ectopicaUy synthesized in and secreted from tumor ceUs include ACTH and vasopressin (lung and pancreatic cancers); parathyroid hormone (lung and bladder cancers); calcitonin (lung and breast cancers); and thyroid-stimulating hormone (meduUary thyroid carcinoma). Such peptides may be useful as diagnostic markers for tumorigenesis (Schwartz, M.Z. (1997) Semin. Pediatr. Surg. 3:141-146; and Said, S.I. and G.R Faloona (1975) N. Engl. J. Med. 293:155-160). Defective nuclear transport may play a role in cancer. The BRCA1 protein contains three potential NLSs which interact with importin alpha, and is transported into the nucleus by the importin/NPC pathway. In breast cancer ceUs the BRCA1 protein is aberrantly locaHzed in the cytoplasm. The mislocation of the BRCA1 protein in breast cancer ceHs may be due to a defect in the NPC nuclear import pathway (Chen, CF. et al. (1996) J. Biol. Chem. 271:32863-32868). It has been suggested that in some breast cancers, the tumor-suppressing activity of p53 is inactivated by the sequestration of the protein in the cytoplasm, away from its site of action in the ceU nucleus. Cytoplasmic wild-type p53 was also found inhuman cervical carcinoma ceH Hues. (MoU, U.M. et al. (1992) Proc. Natl. Acad. Sci. USA 89:7262-7266; and Liang, X.H. et al. (1993) Oncogene 8:2645-2652.) Environmental Responses
Organisms respond to the environment by a number of pathways. Heat shock proteins, including hsp 70, hsp60, hsp90, and hsp 40, assist organisms in coping with heat damage to ceUular proteins.
Aquaporins (AQP) are channels that transport water and, in some cases, nonionic smaU solutes such as urea and glycerol. Water movement is important for a number of physiological processes including renal fluid filtration, aqueous humor generation in the eye, cerebrospinal fluid production in the brain, and appropriate hydration of the lung. Aquaporins are members of the major intrinsic protein (MB?) family of membrane transporters (King, L.S. and P. Agre (1996) Annu. Rev. Physiol. 58:619-648; Ishibashi, K. et al. (1997) J. Biol. Chem. 272:20782-20786). The study of aquaporins may have relevance to understanding edema formation and fluid balance in both normal physiology and disease states (King, supra). utat ons n cause autosoma recess ve nephrogenic diabetes insipidus (OMIM *107777 Aquaporin 2; AQP2). Reduced AQP4 expression in skeletal muscle may be associated with Duchenne muscular dystrophy (Frigeri, A. et al. (1998) J. Clin. Invest. 102:695-703). Mutations in AQPO cause autosomal dominant cataracts in the mouse 5 (OMD I * 154050 Major Intrinsic Protein of Lens Fiber; MTP).
The metaUothioneins (MTs) are a group of smaU (61 amino acids), cysteine-rich proteins that bind heavy metals such as cadmium, zinc, mercury, lead, and copper and are thought to play a role in metal detoxification or the metaboHsm and homeostasis of metals. Arsenite-resistance proteins have been identified in hamsters that are resistant to toxic levels of arsenite (Rossman, T.G. et al. (1997) 0 Mutat. Res. 386:307-314).
Humans respond to Hght and odors by specific protein pathways. Proteins involved in Hght perception include rhodopsin, transducin, and cGMP phosphodiesterase. Proteins involved in odor perception include multiple olfactory receptors. Other proteins are important in human Orcadian rhythms and responses to wounds. 5 Immunity and Host Defense
AU vertebrates have developed sophisticated and complex immune systems that provide protection from viral, bacterial, fungal and parasitic infections. Included in these systems are the processes of humoral immunity, the complement cascade and the inflammatory response (Paul, W.E. (1993) Fundamental Immunology, Raven Press, Ltd., New York NY, pp.1-20). o The ceUular components of the humoral immune system include six different types of leukocytes: monocytes, lymphocytes, polymorphonuclear granulocytes (consisting of neutrophils, eosinophils, and basopbils) and plasma ceUs. AdditionaUy, fragments of megakaryocytes, a seventh type of white blood ceU in the bone marrow, occur in large numbers in the blood as platelets.
Leukocytes are formed from two stem ceH lineages in bone marrow. The myeloid stem ceU 5 Hne produces granulocytes and monocytes and, the lymphoid stem ceH produces lymphocytes.
Lymphoid ceUs travel to the thymus, spleen and lymph nodes, where they mature and differentiate into lymphocytes. Leukocytes are responsible for defending the body against invading pathogens. Neutrophils and monocytes attack invading bacteria, viruses, and other pathogens and destroy them by phagocytosis. Monocytes enter tissues and differentiate into macrophages which are extremely o phagocytiα Lymphocytes and plasma ceHs are a part of the immune system which recognizes specific foreign molecules and organisms and inactivates them, as weU as signals other ceHs to attack the invaders.
Granulocytes and monocytes are formed and stored in the bone marrow until needed. Megakaryocytes are produced in bone marrow, where they fragment into platelets and are released 5 into the bloodstream. The main function of platelets is to activate the blood clotting mechanism. ymp ocytes an p asma ce s are pro uce n var ous ymp ogenous organs, nc u ng t e ymp nodes, spleen, thymus, and tonsils.
Both neutrophils and macrophages exhibit chemotaxis towards sites of inflammation. Tissue inflammation in response to pathogen invasion results in production of chemo-attractants for 5 leukocytes, such as endotoxins or other bacterial products, prostaglandins, and products of leukocytes or platelets.
Basophils participate in the release of the chemicals involved in the inflammatory process. The main function of basophils is secretion of these chemicals to such a degree that they have been referred to as "uniceUular endocrine glands." A distinct aspect of basopbiHc secretion is that the 0 contents of granules go directly into the extraceHular environment, not into vacuoles as occurs with neutrophils, eosinophils and monocytes. Basophils have receptors for the Fc fragment of immunoglobulin E (IgE) that are not present on other leukocytes. Crosslinking of membrane IgE with anti-IgE or other Hgands triggers degranulation.
Eosinophils are bi- or multi-nucleated white blood ceHs which contain eosinophiHc granules. 5 Their plasma membrane is characterized by Ig receptors, particularly IgG and IgE. GeneraUy, eosinophils are stored in the bone marrow until recruited for use at a site of inflammation or invasion. They have specific functions in parasitic infections and aUergic reactions, and are thought to detoxify some of the substances released by mast ceUs and basophils which cause inflammation. AdditionaUy, they phagocytize antigen-antibody complexes and further help prevent spread of the inflammation. o Macrophages are monocytes that have left the blood stream to settle in tissue. Once monocytes have migrated into tissues, they do not re-enter the bloodstream. The mononuclear phagocyte system is comprised of precursor ceHs in the bone marrow, monocytes in circulation, and macrophages in tissues. The system is capable of very fast and extensive phagocytosis. A macrophage may phagocytize over 100 bacteria, digest them and extrude residues, and then survive 5 for many more months. Macrophages are also capable of ingesting large particles, including red blood ceUs and malarial parasites. They increase several-fold in size and transform into macrophages that are characteristic of the tissue they have entered, surviving in tissues for several months.
Mononuclear phagocytes are essential in defending the body against invasion by foreign pathogens, particularly intraceUular microorganisms such as M. tuberculosis, Hsteria, leishmania and o toxoplasma. Macrophages can also control the growth of tumorous ceUs, via both phagocytosis and secretion of hydrolytic enzymes. Another important function of macrophages is that of processing antigen and presenting them in a biochemicaUy modified form to lymphocytes.
The immune system responds to invading microorganisms in two major ways: antibody production and ceH mediated responses. Antibodies are immunoglobulin proteins produced by 5 B-lymphocytes which bind to specific antigens and cause inactivation or promote destruction of the antigen by other ceHs. CeU-mediated immune responses involve T-lymphocytes (T ceHs) that react with foreign antigen on the surface of infected host ceUs. Depending on the type of T ceH, the infected ceU is either kiHed or signals are secreted which activate macrophages and other ceUs to destroy the infected ceU (Paul, supra). 5 T-lymphocytes originate in the bone marrow or Hver in fetuses. Precursor ceUs migrate via the blood to the thymus, where they are processed to mature into T-lymphocytes. This processing is crucial because of positive and negative selection of T ceUs that wiH react with foreign antigen and not with self molecules. After processing, T ceUs continuously circulate in the blood and secondary lymphoid tissues, such as lymph nodes, spleen, certain epitheHum-associated tissues in the 0 gastrointestinal tract, respiratory tract and skin. When T-lymphocytes are presented with the complementary antigen, they are stimulated to proHferate and release large numbers of activated T ceHs into the lymph system and the blood system. These activated T ceUs can survive and circulate for several days. At the same time, T memory ceUs are created, which remain in the lymphoid tissue for months or years. Upon subsequent exposure to that specific antigen, these memory ceHs wiU 5 respond more rapidly and with a stronger response than induced by the original antigen. This creates an "immunological memory" that can provide immunity for years.
There are two major types of T ceUs: cytotoxic T ceUs destroy infected host ceUs, and helper T ceUs activate other white blood ceUs via chemical signals. One class of helper ceH, TH1, activates macrophages to destroy ingested microorganisms, while another, TH2, stimulates the production of o antibodies by B ceUs.
Cytotoxic T ceUs directly attack the infected target ceH. In virus-infected ceHs, peptides derived from viral proteins are generated by the proteasome. These peptides are transported into the ER by the transporter associated with antigen processing (TAP) (Pa er, E. and P. CressweU (1998) Annu. Rev. Immunol. 16:323-358). Once inside the ER, the peptides bind MHC I chains, and the 5 peptide/MHC I complex is transported to the ceU surface. Receptors on the surface of T ceUs bind to antigen presented on ceH surface MHC molecules. Once activated by binding to antigen, T ceUs secrete γ-interferon, a signal molecule that induces the expression of genes necessary for presenting viral (or other) antigens to cytotoxic T ceUs. Cytotoxic T ceUs kiU the infected ceUby stimulating programmed ceU death. o Helper T ceUs constitute up to 75% of the total T ceU population. They regulate the immune functions by producing a variety of lymphokines that act on other ceUs in the immune system and on bone marrow. Among these lymphokines are: interleukins-2,3,4,5,6; granulocyte-monocyte colony stimulating factor, and γ-interferon.
Helper T ceUs are required for most B ceUs to respond to antigen. When an activated helper 5 ceU contacts a B ceU, its centiosome and Golgi apparatus become oriented toward the B ceU, aiding the directing of signal molecules, such as transmembrane-bound protein caUed CD40 Hgand, onto the B ceU surface to interact with the CD40 transmembrane protein. Secreted signals also help B ceUs to proHferate and mature and, in some cases, to switch the class of antibody being produced.
B-lymphocytes (B ceUs) produce antibodies which react with specific antigenic proteins presented by pathogens. Once activated, B ceUs become filled with extensive rough endoplasmic reticulum and are known as plasma ceUs. As with T ceUs, interaction of B ceUs with antigen stimulates proHferation of only those B ceUs which produce antibody specific to that antigen. There are five classes of antibodies, known as immunoglobulins, which together comprise about 20% of total plasma protein. Each class mediates a characteristic biological response after antigen binding. Upon activation by specific antigen B ceUs switch from making membrane-bound antibody to secretion of that antibody.
Antibodies, or immunoglobulins (Ig), are the founding members of the Ig superfamily and the central components of the humoral immune response. Antibodies are either expressed on the surface of B ceUs or secreted by B ceUs into the circulation. Antibodies bind and neutraHze blood-borne foreign antigens. The prototypical antibody is a tetramer consisting of two identical heavy polypeptide chains (H-chains) and two identical Hght polypeptide chains (L-chains) interHnked by disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules. Antibodies are classified based on their H-chain composition. The five antibody classes, IgA, IgD, IgE, IgG and IgM, are defined by the α, δ, ε, γ, and μ H-chain types. There are two types of L-chains, and λ, either of which may associate as a pair with any H-chain pair. IgG, the most common class of antibody found in the circulation, is tetrameric, while the other classes of antibodies are generaHy variants or multimers of this basic structure.
H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant region. Both H-chains and L-chains contain repeated Ig domains. For example, a typical H-chain contains four Ig domains, three of which occur within the constant region and one of which occurs within the variable region and contributes to the formation of the antigen recognition site. Likewise, a typical L-chain contains two Ig domains, one of which occurs within the constant region and one of which occurs within the variable region. In addition, H chains such as μ have been shown to associate with other polypeptides during differentiation of the B ceU. Antibodies can be described in terms of their two main functional domains. Antigen recognition is mediated by the Fab (antigen binding fragment) region of the antibody, while effector functions are mediated by the Fc (crystalHzable fragment) region. Binding of antibody to an antigen, such as a bacterium, triggers the destruction of the antigen by phagocytic white blood ceUs such as macrophages and neutrophils. These ceHs express surface receptors that specificaUy bind to the antibody Fc region and aHow the phagocytic ceHs to engulf, ingest, and degrade the antibody-bound antigen. The Fc receptors expressed by phagocytic ceHs are single-pass transmembrane glycoproteins of about 300 to 400 amino acids (Sears, D.W. et al. (1990) J. Immunol. 144:371-378). The extraceUular portion of the Fc receptor typicaUy contains two or three Ig domains.
Diseases which cause over- or under-abundance of any one type of leukocyte usuaUy result in the entire immune defense system becoming involved. A weU-known autoimmune disease is AIDS (Acquired Immunodeficiency Syndrome) where the number of helper T ceUs is depleted, leaving the patient susceptible to infection by microorganisms and parasites. Another widespread medical condition attributable to the immune system is that of aHergic reactions to certain antigens. AUergic reactions include: hay fever, asthma, anaphylaxis, and urticaria (hives). Leukemias are an excess production of white blood ceUs, to the point where a major portion of the body's metaboHc resources are directed solely at proHferation of white blood ceUs, leaving other tissues to starve. Leukopenia or agranulocytosis occurs when the bone marrow stops producing white blood ceUs. This leaves the body unprotected against foreign microorganisms, including those which normaHy inhabit skin, mucous membranes, and gastrointestinal tract. If aU white blood ceU production stops completely, infection wiH occur within two days and death may foUow only 1 to 4 days later.
Impaired phagocytosis occurs in several diseases, including monocytic leukemia, systemic lupus, and granulomatous disease. In such a situation, macrophages can phagocytize normaUy, but the enveloped organism is not killed. A defect in the plasma membrane enzyme which converts oxygen to lethaHy reactive forms results in abscess formation in Hver, lungs, spleen, lymph nodes, and beneath the skin. EosinophiHa is an excess of eosinophils commonly observed in patients with aUergies (hay fever, asthma), aHergic reactions to drugs, rheumatoid arthritis, and cancers (Hodgkin's disease, lung, and Hver cancer) (Isselbacher, KJ. et al. (1994) Harrison's Principles of Internal Medicine, McGraw-HiU, Inc., New York NY).
Host defense is further augmented by the complement system. The complement system serves as an effector system and is involved in infectious agent recognition. It can function as an independent immune network or in conjunction with other humoral immune responses. The complement system is comprised of numerous plasma and membrane proteins that act in a cascade of reaction sequences whereby one component activates the next. The result is a rapid and ampHfied response to infection through either an inflammatory response or increased phagocytosis. The complement system has more than 30 protein components which can be divided into functional groupings including modified serine proteases, membrane-binding proteins and regulators of complement activation. Activation occurs through two different pathways the classical and the alternative. Both pathways serve to destroy infectious agents through distinct triggering mechanisms that eventuaUy merge with the involvement of the component C3. The classical pathway requires antibody binding to infectious agent antigens. The antibodies serve to define the target and initiate the complement system cascade, culminating in the destruction of the infectious agent. In this pathway, since the antibody guides initiation of the process, the complement can be seen as an effector arm of the humoral immune system.
The alternative pathway of the complement system does not require the presence of pre- 5 existing antibodies for targeting infectious agent destruction. Rather, this pathway, through low levels of an activated component, remains constantly primed and provides surveiUance in the non-immune host to enable targeting and destruction of infectious agents. In this case foreign material triggers the cascade, thereby facilitating phagocytosis or lysis (Paul, supra, pp.918-919).
Another important component of host defense is the process of inflammation. Inflammatory 0 responses are divided into four categories on the basis of pathology and include aUergic inflammation, cytotoxic antibody mediated inflammation, immune complex mediated inflammation and monocyte mediated inflammation. Inflammation manifests as a combination of each of these forms with one predominating.
AUergic acute inflammation is observed in individuals wherein specific antigens stimulate IgE 5 antibody production. Mast ceUs and basophils are subsequently activated by the attachment of antigen-IgE complexes, resulting in the release of cytoplasmic granule contents such as histamine. The products of activated mast ceHs can increase vascular permeabiHty and constrict the smooth muscle of breathing passages, resulting in anaphylaxis or asthma. Acute inflammation is also mediated by cytotoxic antibodies and can result in the destruction of tissue through the binding of complement- o fixing antibodies to ceHs. The responsible antibodies are of the IgG or IgM types. Resultant clinical disorders include autoimmune hemolytic anemia and thrombocytopenia as associated with systemic lupus erythematosis.
Immune complex mediated acute inflammation involves the IgG or IgM antibody types which combine with antigen to activate the complement cascade. When such immune complexes bind to 5 neutrophils and macrophages they activate the respiratory burst to form protein- and vessel-damaging agents such as hydrogen peroxide, hydroxyl radical, hypochlorous acid, and chloramines. Clinical manifestations include rheumatoid arthritis and systemic lupus erythematosus.
In chronic inflammation or delayed-type hypersensitivity, macrophages are activated and process antigen for presentation to T ceUs that subsequently produce lymphokines and monokines. o This type of inflammatory response is likely important for defense against intraceUular parasites and certain viruses. Clinical associations include, granulomatous disease, tuberculosis, leprosy, and sarcoidosis (Paul, W.E., supra, pp.1017-1018).
Extracellular Information Transmission Molecules 5 InterceUular communication is essential for the growth and survival of multiceHular organisms, an n particu ar, or t e unct on o e en ocrne, nervous, an immune systems. n a on, , interceUular communication is critical for developmental processes such as tissue construction and organogenesis, in which ceU proHferation, ceU differentiation, and morphogenesis must be spatiaUy and temporaUy regulated in a precise and coordinated manner. CeUs communicate with one another through the secretion and uptake of diverse types of signaling molecules such as hormones, growth factors, neuropeptides, and cytokines. Hormones
Hormones are signaling molecules that coordinately regulate basic physiological processes from embryogenesis throughout adulthood. These processes include metaboHsm, respiration, reproduction, excretion, fetal tissue differentiation and organogenesis, growth and development, homeostasis, and the stress response. Hormonal secretions and the nervous system are tightly integrated and interdependent. Hormones are secreted by endocrine glands, primarily the hypothalamus and pituitary, the thyroid and parathyroid, the pancreas, the adrenal glands, and the ovaries and testes. The secretion of hormones into the circulation is tightly controHed. Hormones are often secreted in diurnal, pulsatile, and cycHc patterns. Hormone secretion is regulated by perturbations in blood biochemistry, by other upstream-acting hormones, by neural impulses, and by negative feedback loops. Blood hormone concentrations are constantly monitored and adjusted to maintain optimal, steady-state levels. Once secreted, hormones act only on those target ceHs that express specific receptors.
Most disorders of the endocrine system are caused by either hyposecretion or hypersecretion of hormones. Hyposecretion often occurs when a hormone's gland of origin is damaged or otherwise impaired. Hypersecretion often results from the proHferation of tumors derived from hormone- secreting ceHs. Inappropriate hormone levels may also be caused by defects in regulatory feedback loops or in the processing of hormone precursors. Endocrine malfunction may also occur when the target ceH fails to respond to the hormone.
Hormones can be classified biochemicaUy as polypeptides, steroids, eicosanoids, or amines. Polypeptides, which include diverse hormones such as insulin and growth hormone, vary in size and function and are often synthesized as inactive precursors that are processed intraceUularly into mature, active forms. Amines, which include epinephrine and dopamine, are amino acid derivatives that function in neuroendocrine signaling. Steroids, which include the cholesterol-derived hormones estrogen and testosterone, function in sexual development and reproduction. Eicosanoids, which include prostaglandins and prostacycHns, are fatty acid derivatives that function in a variety of processes. Most polypeptides and some amines are soluble in the circulation where they are highly susceptible to proteolytic degradation within seconds after their secretion. Steroids and Hpids are nso u e and must e transported n t e c rcu at on y carr er prote ns. T e foUow ng d scuss on w focus primarily on polypeptide hormones.
Hormones secreted by the hypothalamus and pituitary gland play a critical role in endocrine function by coordinately regulating hormonal secretions from other endocrine glands in response to neural signals. Hypothalamic hormones include thyrotropin-releasing hormone, gonadotropin-releasing hormone, somatostatin, growth-hormone releasing factor, corticotropin-releasing hormone, substance P, dopamine, and prolactin-releasing hormone. These hormones directly regulate the secretion of hormones from the anterior lobe of the pituitary. Hormones secreted by the anterior pituitary include adrenocorticotropic hormone (ACTTT), melanocyte-stimulating hormone, somatotropic hormones such as growth hormone and prolactin, glycoprotein hormones such as thyroid-stimulating hormone, luteinizing hormone (LH), and foUicle-stimulating hormone (FSH), β-Hpotropin, and β-endorphins. These hormones regulate hormonal secretions from the thyroid, pancreas, and adrenal glands, and act directly on the reproductive organs to stimulate ovulation and spermatogenesis. The posterior pituitary synthesizes and secretes antidiuretic hormone (ADH, vasopressin) and oxytocin. Disorders of the hypothalamus and pituitary often result from lesions such as primary brain tumors, adenomas, infarction associated with pregnancy, hypophysectomy, aneurysms, vascular malformations, thrombosis, infections, immunological disorders, and compHcations due to head trauma. Such disorders have profound effects on the function of other endocrine glands. Disorders associated with hypopituitarism include hypogonadism, Sheehan syndrome, diabetes insipidus, KaUman's disease, Hand-SchuHer-Christian disease, Letterer-Siwe disease, sarcoidosis, empty seUa syndrome, and dwarfism. Disorders associated with hyperpituitarism include acromegaly, giantism, and syndrome of inappropriate ADH secretion (SIADH), often caused by benign adenomas.
Hormones secreted by the thyroid and parathyroid primarily control metaboHc rates and the regulation of serum calcium levels, respectively. Thyroid hormones include calcitonin, somatostatin, and thyroid hormone. The parathyroid secretes parathyroid hormone. Disorders associated with hypothyroidism include goiter, myxedema, acute thyroiditis associated with bacterial infection, subacute thyroiditis associated with viral infection, autoimmune thyroiditis (Hashimoto's disease), and cretinism. Disorders associated with hyperthyroidism include thyrotoxicosis and its various forms, Grave's disease, pretϊbial myxedema, toxic multinodular goiter, thyroid carcinoma, and Plummer's disease. Disorders associated with hyperparafhyroidism include Conn disease (chronic hypercalemia) leading to bone resorption and parathyroid hyperplasia.
Hormones secreted by the pancreas regulate blood glucose levels by modulating the rates of carbohydrate, fat, and protein metaboHsm. Pancreatic hormones include insulin, glucagon, amylin, γ- aminobutyric acid, gastrin, somatostatin, and pancreatic polypeptide. The principal disorder associated with pancreatic dysfunction is diabetes meUitus caused by insufficient insulin activity. Diabetes meUitus is generaUy classified as either Type I (insulin-dependent, juvenile diabetes) or Type H (non- insulin-dependent, adult diabetes). The treatment of both forms by insulin replacement therapy is weU known. Diabetes meUitus often leads to acute compHcations such as hypoglycemia (insulin shock), coma, diabetic ketoacidosis, lactic acidosis, and chronic compHcations leading to disorders of the eye, kidney, skin, bone, joint, cardiovascular system, nervous system, and to decreased resistance to infection.
The anatomy, physiology, and diseases related to hormonal function are reviewed in McCance, K.L. and S.E. Huether (1994) Pathophysiology: The Biological Basis for Disease in Adults and Children, Mosby-Year Book, Inc., St. Louis MO; Greenspan, F.S. and J.D. Baxter (1994) Basic and Clinical Endocrinology, Appleton and Lange, East Norwalk CT. Growth Factors
Growth factors are secreted proteins that mediate interceUular communication. Unlike hormones, which travel great distances via the circulatory system, most growth factors are primarily local mediators that act on neighboring ceUs. Most growth factors contain a hydrophobic N-terminal signal peptide sequence which directs the growth factor into the secretory pathway. Most growth factors also undergo post-translational modifications within the secretory pathway. These modifications can include proteolysis, glycosylation, phosphorylation, and intramolecular disulfide bond formation. Once secreted, growth factors bind to specific receptors on the surfaces of neighboring target ceUs, and the bound receptors trigger intraceUular signal transduction pathways. These signal transduction pathways eHcit specific ceUular responses in the target ceHs. These responses can include the modulation of gene expression and the stimulation or inhibition of ceU division, ceU differentiation, and ceU motiHty.
Growth factors faH into at least two broad and overlapping classes. The broadest class includes the large polypeptide growth factors, which are wide-ranging in their effects. These factors include epidermal growth factor (EGF), fibroblast growth factor (FGF), transforming growth factor-β (TGF-β), insulin-like growth factor (IGF), nerve growth factor (NGF), and platelet-derived growth factor (PDGF), each defining a family of numerous related factors. The large polypeptide growth factors, with the exception of NGF, act as mitogens on diverse ceU types to stimulate wound healing, bone synthesis and remodeling, extraceUular matrix synthesis, and proHferation of epitheHal, epidermal, and connective tissues. Members of the TGF-β, EGF, and FGF famiHes also function as inductive signals in the differentiation of embryonic tissue. NGF functions specificaUy as a neurotrophic factor, promoting neuronal growth and differentiation.
Another class of growth factors includes the hematopoietic growth factors, which are narrow in their target specificity. These factors stimulate the proHferation and differentiation of blood ceHs such as B-lymphocytes, T-lymphocytes, erythrocytes, platelets, eosinophils, basophils, neutrophils, macrophages, and their stem ceH precursors. These factors include the colony-stimulating factors (U-
CSF, M-CSF, GM-CSF, and CSF1-3), erythropoietin, and the cytokines. The cytokines are speciaHzed hematopoietic factors secreted by ceUs of the immune system and are discussed in detail below.
Growth factors play critical roles in neoplastic transformation of ceUs in vitro and in tumor progression in vivo. Overexpression of the large polypeptide growth factors promotes the proHferation and transformation of ceUs in culture. Inappropriate expression of these growth factors by tumor ceUs in vivo may contribute to tumor vascularization and metastasis. Inappropriate activity of hematopoietic growth factors can result in anemias, leukemias, and lymphomas. Moreover, growth factors are both structuraHy and functionaUy related to oncoproteins, the potentiaUy cancer-causing products of proto- oncogenes. Certain FGF and PDGF family members are themselves homologous to oncoproteins, whereas receptors for some members of the EGF, NGF, and FGF famiHes are encoded by proto- oncogenes. Growth factors also affect the transcriptional regulation of both proto-oncogenes and oncosuppressor genes (Pimentel, E. (1994) Handbook of Growth Factors, CRC Press, Ann Arbor MI; McKay, I. and I. Leigh, eds. (1993) Growth Factors: A Practical Approach, Oxford University Press, New York NY; Habenicht, A., ed. (1990) Growth Factors, Differentiation Factors, and Cytokines, Springer- Verlag, New York NY).
In addition, some of the large polypeptide growth factors play crucial roles in the induction of the primordial germ layers in the developing embryo. This induction ultimately results in the formation of the embryonic mesoderm, ectoderm, and endoderm which in turn provide the framework for the entire adult body plan. Disruption of this inductive process would be catastrophic to embryonic development. SmaU Peptide Factors - Neuropeptides and Vasomediators
Neuropeptides and vasomediators (NP/VM) comprise a family of smaU peptide factors, typicaHy of 20 amino acids or less. These factors generaUy function in neuronal excitation and inhibition of vasoconstriction/vasodilation, muscle contraction, and hormonal secretions from the brain and other endocrine tissues. Included in this family are neuropeptides and neuropeptide hormones such as bombesin, neuropeptide Y, neurotensin, neuromedin N, melanocortins, opioids, galanin, somatostatin, tachykinins, urotensin H and related peptides involved in smooth muscle stimulation, vasopressin, vasoactive intestinal peptide, and circulatory system-borne signaling molecules such as angiotensin, complement, calcitonin, endotheHns, formyl-methionyl peptides, glucagon, cholecystokinin, gastrin, and many of the peptide hormones discussed above. NP/VMs can transduce signals directly, modulate the activity or release of other neurotransmitters and hormones, and act as catalytic enzymes in signaling cascades. The effects of NP/VMs range from extremely brief to long-lasting. (Reviewed in Martin, CR. et al. (1985) Endocrine Physiology, Oxford University Press, New York NY, pp. 57- 62.) Cytokines
Cytokines comprise a family of signaling molecules that modulate the immune system and the inflammatory response. Cytokines are usuaHy secreted by leukocytes, or white blood ceUs, in response to injury or infection. Cytokines function as growth and differentiation factors that act 5 primarily on ceUs of the immune system such as B- and T-lymphocytes, monocytes, macrophages, and granulocytes. Like other signaling molecules, cytokines bind to specific plasma membrane receptors and trigger intraceUular signal transduction pathways which alter gene expression patterns. There is considerable potential for the use of cytokines in the treatment of inflammation and immune system disorders. 0 Cytokine structure and function have been extensively characterized in vitro. Most cytokines are smaU polypeptides of about 30 kilodaltons or less. Over 50 cytokines have been identified from human and rodent sources. Examples of cytokine subfamiHes include the interferons (TFN-α, -β, and - γ), the interleukins (IL1-IL13), the tumor necrosis factors (TNF-α and -β), and the chemokines. Many cytokines have been produced using recombinant DNA techniques, and the activities of 5 individual cytokines have been determined in vitro. These activities include regulation of leukocyte proHferation, differentiation, and motiHty.
The activity of an individual cytokine in vitro may not reflect the fuU scope of that cytokine' s activity in vivo. Cytokines are not expressed individually in vivo but are instead expressed in combination with a multitude of other cytokines when the organism is chaUenged with a stimulus. o Together, these cytokines coUectively modulate the immune response in a manner appropriate for that particular stimulus. Therefore, the physiological activity of a cytokine is determined by the stimulus itself and by complex interactive networks among co-expressed cytokines which may demonstrate both synergistic and antagonistic relationships.
Chemokines comprise a cytokine subfamily with over 30 members. (Reviewed in WeUs, T. 5 N.C. and M.C Peitsch (1997) J. Leukoc. Biol. 61:545-550.) Chemokines were initiaUy identified as chemotactic proteins that recruit monocytes and macrophages to sites of inflammation. Recent evidence indicates that chemokines may also play key roles in hematopoiesis and HTV-l infection. Chemokines are smaU proteins which range from about 6-15 kilodaltons in molecular weight. Chemokines are further classified as C, CC, CXC, or CX3C based on the number and position of o critical cysteine residues. The CC chemokines, for example, each contain a conserved motif consisting of two consecutive cysteines foUowed by two additional cysteines which occur downstream at 24- and 16-residue intervals, respectively (ExPASy PROSITE database, documents PS00472 and PDOC00434). The presence and spacing of these four cysteine residues are highly conserved, whereas the intervening residues diverge significantly. However, a conserved tyrosine located about 5 15 residues downstream of the cysteine doublet seems to be important for chemotactic activity. Most of the human genes encoding CC chemokines are clustered on chromosome 17, although there are a few examples of CC chemokine genes that map elsewhere. Other chemokines include lymphotactin (C chemokine); macrophage chemotactic and activating factor (MCAF/MCP-1; CC chemokine); platelet factor 4 and IL-8 (CXC chemokines); and fractalkine and neurotractin (CX3C chemokines). (Reviewed in Luster, A.D. (1998) N. Engl. J. Med. 338:436-445.)
Receptor Molecules
The term receptor describes proteins that specificaUy recognize other molecules. The category is broad and includes proteins with a variety of functions. The bulk of receptors are ceU surface proteins which bind extraceUular Hgands and produce ceUular responses in the areas of growth, differentiation, endocytosis, and immune response. Other receptors faciHtate the selective transport of proteins out of the endoplasmic reticulum and locaHze enzymes to particular locations in the ceU. The term may also be appHed to proteins which act as receptors for Hgands with known or unknown chemical composition and which interact with other ceUular components. For example, the steroid hormone receptors bind to and regulate transcription of DNA.
Regulation of ceU proHferation, differentiation, and migration is important for the formation and function of tissues. Regulatory proteins such as growth factors coordinately control these ceUular processes and act as mediators in ceU-ceU signaling pathways. Growth factors are secreted proteins that bind to specific ceU-surface receptors on target ceUs. The bound receptors trigger intraceUular signal transduction pathways which activate various downstream effectors that regulate gene expression, ceU division, ceH differentiation, ceU motiHty, and other ceUular processes.
CeU surface receptors are typicaUy integral plasma membrane proteins. These receptors recognize hormones such as catecholamines; peptide hormones; growth and differentiation factors; smaU peptide factors such as thyrotropin-releasing hormone; galanin, somatostatin, and tachykinins; and circulatory system-borne signaling molecules. CeU surface receptors on immune system ceHs recognize antigens, antibodies, and major histocompatibiHty complex (MHC)-bound peptides. Other ceU surface receptors bind Hgands to be internaHzed by the ceU. This receptor-mediated endocytosis functions in the uptake of low density Hpoproteins (LDL), transferrin, glucose- or mannose-terminal glycoproteins, galactose-terminal glycoproteins, immunoglobulins, phosphoviteHogenins, fibrin, proteinase-inhibitor complexes, plasminogen activators, and thrombospondin (Lodish, H. et al. (1995) Molecular CeU Biology, Scientific American Books, New York NY, p. 723; Mikhailenko, I. et al. (1997) J. Biol. Chem. 272:6784-6791). Receptor Protein Kinases
Many growth factor receptors, including receptors for epidermal growth factor, platelet-derived growth factor, fibroblast growth factor, as weU as the growth modulator α-thrombin, conta n ntr ns c prote n nase ac v t es. en growt factor n s to t e receptor, t t ggers t e autophosphorylation of a serine, threonine, or tyrosine residue on the receptor. These phosphorylated sites are recognition sites for the binding of other cytoplasmic signaling proteins. These proteins participate in signaling pathways that eventuaUy link the initial receptor activation at the ceU surface to the activation of a specific intraceUular target molecule. In the case of tyrosine residue autophosphorylation, these signaling proteins contain a common domain referred to as a Src homology (SH) domain. SH2 domains and SH3 domains are found in phosphoHpase C-γ, PI-3-K p85 regulatory subunit, Ras-GTPase activating protein, and pp60°-src (Lowenstein, E . et al. (1992) CeU 70:431-442). The cytokine family of receptors share a different common binding domain and include transmembrane receptors for growth hormone (GH), interleukins, erythropoietin, and prolactin.
Other receptors and second messenger-binding proteins have intrinsic serine/threonine protein kinase activity. These include activin/TGF-β/BMP-superfamily receptors, calcium- and diacylglycerol-activated/phosphoHpid-dependant protein kinase (PK-C), and RNA-dependant protein kinase (PK-R). In addition, other serine/threonine protein kinases, including nematode Twitchin, have fibronectin-like, immunoglobulin C2-like domains. G-Protein Coupled Receptors
G-protein coupled receptors (GPCRs) are integral membrane proteins characterized by the presence of seven hydrophobic transmembrane domains which span the plasma membrane and form a bundle of antiparaUel alpha (α) heHces. These proteins range in size from under 400 to over 1000 amino acids (Strosberg, A.D. (1991) Eur. J. Biochem. 196:1-10; CoughHn, S.R. (1994) Curr. Opin. CeU Biol. 6:191-197). The ammo-terminus of the GPCR is extraceUular, of variable length and often glycosylated; the carboxy-terminus is cytoplasmic and generaUy phosphorylated. ExtraceUular loops of the GPCR alternate with intraceUular loops and link the transmembrane domains. The most conserved domains of GPCRs are the transmembrane domains and the first two cytoplasmic loops. The transmembrane domains account for structural and functional features of the receptor. In most cases, the bundle of α heHces forms a binding pocket. In addition, the extraceUular N-terminal segment or one or more of the three extraceUular loops may also participate in Hgand binding. Ligand binding activates the receptor by inducing a conformational change in intraceUular portions of the receptor. The activated receptor, in turn, interacts with an intraceUular heterotrimeric guanine nucleotide binding (G) protein complex which mediates further intraceUular signaling activities, generaUy the production of second messengers such as cycHc AMP (cAMP), phosphoHpase C, inositol triphosphate, or interactions with ion channel proteins (Baldwin, J.M. (1994) Curr. Opin. CeH Biol. 6:180-190).
GPCRs include those for acetylcholine, adenosine, epinephrine and norepinephrine, bombesin, bradykinin, chemokines, dopamine, endotheHn, γ-aminobutyric acid (GABA), foUicle-stimulating hormone (FSH), glutamate, gonadotropin-releasing hormone (GnRH), hepatocyte growth factor,
Mstamine, leukotrienes, melanocortins, neuropeptide Y, opioid peptides, opsins, prostanoids, serotonin, somatostatin, tachykinins, thrombin, thyrotropin-releasing hormone (TRH), vasoactive intestinal polypeptide family, vasopressin and oxytocin, and orphan receptors. GPCR mutations, which may cause loss of function or constitutive activation, have been associated with numerous human diseases (CoughHn, supra). For instance, retinitis pigmentosa may arise from mutations in the rhodopsin gene. Rhodopsin is the retinal photoreceptor which is located within the discs of the eye rod ceU. Parma, J. et al. (1993, Nature 365:649-651) report that somatic activating mutations in the thyrotropin receptor cause hyperfunctioning thyroid adenomas and suggest that certain GPCRs susceptible to constitutive activation may behave as protooncogenes. Nuclear Receptors
Nuclear receptors bind smaU molecules such as hormones or second messengers, leading to increased receptor-binding affinity to specific chromosomal DNA elements. In addition the affinity for other nuclear proteins may also be altered. Such binding and protein-protein interactions may regulate and modulate gene expression. Examples of such receptors include the steroid hormone receptors family, the retinoic acid receptors family, and die thyroid hormone receptors family. Ligand-Gated Receptor Ion Channels
Ligand-gated receptor ion channels faU into two categories. The first category, extraceUular Hgand-gated receptor ion channels (ELGs), rapidly transduce neurotransmitter-binding events into electrical signals, such as fast synaptic neurotransmission. ELG function is regulated by posttranslational modification. The second category, intraceUular Hgand-gated receptor ion channels (ILGs), are activated by many intraceUular second messengers and do not require post-translational modifications) to effect a channel-opening response.
ELGs depolarize excitable ceUs to the threshold of action potential generation. In non- excitable ceHs, ELGs permit a limited calcium ion-influx during the presence of agonist. ELGs include channels directly gated by neurotransmitters such as acetylcholine, L-glutamate, glycine, ATP, serotonin, GABA, and histamine. ELG genes encode proteins having strong structural and functional similarities. ILGs are encoded by distinct and unrelated gene famiHes and include receptors for cAMP, cGMP, calcium ions, ATP, and metaboHtes of arachidonic acid. Macrophage Scavenger Receptors
Macrophage scavenger receptors with broad Hgand specificity may participate in the binding of low density Hpoproteins (LDL) and foreign antigens. Scavenger receptors types I and H are trimeric membrane proteins with each subunit containing a smaU N-terminal intraceUular domain, a transmembrane domain, a large extraceUular domain, and a C-terminal cysteine-rich domain. The extraceUular domain contains a short spacer domain, an α-heHcal coiled-coil domain, and a triple heHcal coHagenous domain. These receptors have been shown to bind a spectrum of Hgands, including chemicaUy modified Hpoproteins and albumin, polyribonucleotides, polysaccharides, phosphoHpids, and asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sci. USA 87:9133-9137; Elomaa, O. et al. (1995) CeU 80:603-609). The scavenger receptors are thought to play a key role in atherogenesis by 5 mediating uptake of modified LDL in arterial waUs, and in host defense by binding bacterial endotoxins, bacteria, and protozoa. T-CeU Receptors
T ceUs play a dual role in the immune system as effectors and regulators, coupling antigen recognition with the transmission of signals that induce ceH death in infected ceUs and stimulate 0 proHferation of other immune ceUs. Although a population of T ceUs can recognize a wide range of different antigens, an individual T ceU can only recognize a single antigen and only when it is presented to the T ceU receptor (TCR) as a peptide complexed with a major histocompatibility molecule (MHC) on the surface of an antigen presenting ceH. The TCR on most T ceHs consists of immunoglobulin-like integral membrane glycoproteins containing two polypeptide subunits, α and β, of similar molecular 5 weight. Both TCR subunits have an extraceUular domain containing both variable and constant regions, a transmembrane domain that traverses the membrane once, and a short intraceUular domain (Saito, H. et al. (1984) Nature 309:757-762). The genes for the TCR subunits are constructed through somatic rearrangement of different gene segments. Interaction of antigen in the proper MHC context with the TCR initiates signaling cascades that induce the proHferation, maturation, and function of 0 ceUular components of the immune system (Weiss, A. (1991) Annu. Rev. Genet. 25:487-510). Rearrangements in TCR genes and alterations in TCR expression have been noted in lymphomas, leukemias, autoimmune disorders, and immunodeficiency disorders (Aisenberg, A.C. et al. (1985) N. Engl. J. Med. 313:529-533; Weiss, supra).
5 Intracellular Signaling Molecules
IntraceUular signaling is the general process by which ceUs respond to extraceUular signals (hormones, neurotransmitters, growth and differentiation factors, etc.) through a cascade of biochemical reactions that begins with the binding of a signaling molecule to a ceU membrane receptor and ends with the activation of an intraceUular target molecule. Intermediate steps in the process o involve the activation of various cytoplasmic proteins by phosphorylation via protein kinases, and their deactivation by protein phosphatases, and the eventual translocation of some of these activated proteins to the ceU nucleus where the transcription of specific genes is triggered. The intraceUular signaling process regulates aH types of ceU functions including ceU proHferation, ceU differentiation, and gene transcription, and involves a diversity of molecules including protein kinases and phosphatases, 5 and second messenger molecules, such as cycHc nucleotides, calcium-calmodulin, inositol, and various iiniogens, mat reguiaie protem pnospnoryiauon.
Protein Phosphorylation
Protein kinases and phosphatases play a key role in the intraceUular signaling process by controlling the phosphorylation and activation of various signaling proteins. The high energy phosphate for this reaction is generaUy transferred from the adenosine triphosphate molecule (ATP) to a particular protein by a protein kinase and removed from that protein by a protein phosphatase. Protein kinases are roughly divided into two groups: those that phosphorylate tyrosine residues (protein tyrosine kinases, PTK) and those that phosphorylate serine or threonine residues (serhie/threonine kinases, STK). A few protein kinases have dual specificity for serine/threonine and tyrosine residues. Almost aH kinases contain a conserved 250-300 amino acid catalytic domain containing specific residues and sequence motifs characteristic of the kinase family (Hardie, G. and S. Hanks (1995) The Protein Kinase Facts Books, Vol 1:7-20, Academic Press, San Diego CA).
STKs include the second messenger dependent protein kinases such as the cycHc-AMP dependent protein kinases (PICA), involved in mediating hormone-induced ceUular responses; calcium-calmodulin (CaM) dependent protein kinases, involved in regulation of smooth muscle contraction, glycogen breakdown, and neurotransmission; and the mitogen-activated protein kinases (MAP) which mediate signal transduction from the ceU surface to the nucleus via phosphorylation cascades. Altered PKA expression is impHcated in a variety of disorders and diseases including cancer, thyroid disorders, diabetes, atherosclerosis, and cardiovascular disease (Isselbacher, KJ. et al. (1994) Harrison's Principles of Internal Medicine. McGraw-HiU, New York NY, pp. 416-431, 1887). PTKs are divided into transmembrane, receptor PTKs and nontransmembrane, non-receptor PTKs. Transmembrane PTKs are receptors for most growth factors. Non-receptor PTKs lack transmembrane regions and, instead, form complexes with the intraceUular regions of ceU surface receptors. Receptors that function through non-receptor PTKs include those for cytokines and hormones (growth hormone and prolactin) and antigen-specific receptors on T and B lymphocytes. Many of these PTKs were first identified as the products of mutant oncogenes in cancer ceUs in which their activation was no longer subject to normal ceUular controls. In fact, about one third of the known oncogenes encode PTKs, and it is weU known that ceUular transformation (oncogenesis) is often accompanied by increased tyrosine phosphorylation activity (Charbonneau, H. and N.K. Tonks (1992) Annu. Rev. CeU Biol. 8:463-493).
An additional family of protein kinases previously thought to exist only in procaryotes is the • histidine protein kinase family (HPK). HPKs bear Httle homology with mammaHan STKs or PTKs but have distinctive sequence motifs of their own (Davie, J.R. et al. (1995) J. Biol. Chem. 270:19861-19867). A Mstidine residue in the N-terminal half of the molecule (region I) is an autophosphorylation site. Three additional motifs located in the C-terminal half of the molecule include an invariant asparagine residue in region H and two glycine-rich loops characteristic of nucleotide binding domains in regions HI and IV. Recently a branched chain alpha-ketoacid dehydrogenase kinase has been found with characteristics of HPK in rat (Davie, supra).
Protein phosphatases regulate the effects of protein kinases by removing phosphate groups from molecules previously activated by kinases. The two principal categories of protein phosphatases are the protein (serine/threonine) phosphatases (PPs) and the protein tyrosine phosphatases (PTPs). PPs dephosphorylate phosphoserine/threonine residues and are important regulators of many cAMP-mediated hormone responses (Cohen, P. (1989) Annu. Rev. Biochem. 58:453-508). PTPs reverse the effects of protein tyrosine kinases and play a significant role in ceU cycle and ceU signaling processes (Charbonneau, supra). As previously noted, many PTKs are encoded by oncogenes, and oncogenesis is often accompanied by increased tyrosine phosphorylation activity. It is therefore possible that PTPs may prevent or reverse ceU transformation and the growth of various cancers by controlling the levels of tyrosine phosphorylation in ceUs. This hypothesis is supported by studies showing that overexpression of PTPs can suppress transformation in ceUs, and that specific inhibition of PTPs can enhance ceU transformation (Charbonneau, supra). PhosphoHpid and Inositol-Phosphate Signaling
Inositol phosphoHpids (phosphoinositides) are involved in an intraceUular signaling pathway that begins with binding of a signaling molecule to a G-protein linked receptor in the plasma membrane. This leads to the phosphorylation of phosphatidylinositol (PI) residues on the inner side of the plasma membrane to the biphosphate state (PIP2) by inositol kinases. Simultaneously, the G-protein Hnked receptor binding stimulates a trimeric G-protein which in turn activates a phosphoinositide-specific phosphoHpase C-β. PhosphoHpase C-β then cleaves PTP2 into two products, inositol triphosphate (TP3) and diacylglycerol. These two products act as mediators for separate signaling events. IP3 diffuses through the plasma membrane to induce calcium release from the endoplasmic reticulum (ER), while diacylglycerol remains in the membrane and helps activate protein kinase C, an STK that phosphorylates selected proteins in the target ceU. The calcium response initiated by IP3 is terminated by the dephosphorylatiori of IP3 by specific inositol phosphatases. CeUular responses that are mediated by this pathway are glycogen breakdown in the Hver in response to vasopressin, smooth muscle contraction in response to acetylcholine, and thrombin-induced platelet aggregation. CycHc Nucleotide Signaling
CycHc nucleotides (cAMP and cGMP) function as intraceUular second messengers to transduce a variety of extraceUular signals including hormones, Hght, and neurotransmitters. In particular, cycHc-AMP dependent protein kinases (PKA) are thought to account for aU of the effects of cAMP in most mammaHan ceUs, including various hormone-induced ceUular responses. Visual excitation and the phototransmission of Hght signals in the eye is controUed by cycHc-GMP regulated, Ca2+-specific channels. Because of the importance of ceUular levels of cycHc nucleotides in mediating these various responses, regulating the synthesis and breakdown of cycHc nucleotides is an important matter. Thus adenylyl cyclase, which synthesizes cAMP from AMP, is activated to increase cAMP levels in muscle by binding of adrenaline to β-andrenergic receptors, while activation of guanylate cyclase and increased cGMP levels in photoreceptors leads to reopening of the Ca2+-specific channels and recovery of the dark state in the eye. In contrast, hydrolysis of cycHc nucleotides by cAMP and cGMP-specific phosphodiesterases (PDEs) produces the opposite of these and other effects mediated by increased cycHc nucleotide levels. PDEs appear to be particularly important in the regulation of cycHc nucleotides, considering the diversity found in this family of proteins. At least seven famiHes of mammaHan PDEs (PDEl-7) have been identified based on substrate specificity and affinity, sensitivity to cofactors, and sensitivity to inhibitory drugs (Beavo, J.A. (1995) Physiological Reviews 75:725-748). PDE inhibitors have been found to be particularly useful in treating various clinical disorders. RoHpram, a specific inhibitor of PDE4, has been used in the treatment of depression, and similar inhibitors are undergoing evaluation as anti-inflammatory agents. TheophyHine is a nonspecific PDE inhibitor used in the treatment of bronchial asthma and other respiratory diseases (Banner, K.H. and CP. Page (1995) Eur. Respir. J. 8:996-1000). G-Protein Signaling
Guanine nucleotide binding proteins (G-proteins) are critical mediators of signal transduction between a particular class of extraceUular receptors, the G-protein coupled receptors (GPCR), and intraceUular second messengers such as cAMP and Ca +. G-proteins are linked to the cytosoHc side of a GPCR such that activation of the GPCR by Hgand binding stimulates binding of the G-protein to GTP, inducing an "active" state in the G-protein. In the active state, the G-protein acts as a signal to trigger other events in the ceU such as the increase of cAMP levels or the release of Ca2+ into the cytosol from the ER, which, in turn, regulate phosphorylation and activation of other intraceUular proteins. Recycling of the G-protein to the inactive state involves hydrolysis of the bound GTP to GDP by a GTPase activity in the G-protein. (See Alberts, B. et al. (1994) Molecular Biolo y of the CeH, Garland PubHshing, Inc., New York NY, pp.734-759.) Two structuraUy distinct classes of G- proteins are recognized: heterotrimeric G-proteins, consisting of three different subunits, and monomeric, low molecular weight (LMW), G-proteins consisting of a single polypeptide chain. The three polypeptide subunits of heterotrimeric G-proteins are the , β, and γ subunits. The subunit binds and hydrolyzes GTP. The β and γ subunits form a tight complex that anchors the protein to the inner side of the plasma membrane. The β subunits, also known as G-β proteins or β transducins, contain seven tandem repeats of the WD-repeat sequence motif, a motif found in many proteins with regulatory functions. Mutations and variant expression of β transducin proteins are Junked with various disorders (JNeer, EJ. et al. (1994) Nature 371:297-300; Margottin, F. et al. (1998)
Mol. CeU 1:565-574).
LMW GTP-proteins are GTPases which regulate ceU growth, ceU cycle control, protein secretion, and intraceUular vesicle interaction. They consist of single polypeptides which, like the 5 subunit of the heterotrimeric G-proteins, are able to bind and hydrolyze GTP, thus cycling between an inactive and an active state. At least sixty members of the LMW G-protein superfamily have been identified and are currently grouped into the six subfamilies of ras, rho, arf, sari, ran, and rab. Activated ras genes were initiaUy found in human cancers, and subsequent studies confirmed that ras function is critical in determining whether ceUs continue to grow or become differentiated. Other o members of the LMW G-protein superfamily have roles in signal transduction that vary with the function of the activated genes and the locations of the G-proteins.
Guanine nucleotide exchange factors regulate the activities of LMW G-proteins by determining whether GTP or GDP is bound. GTPase-activating protein (GAP) binds to GTP-ras and induces it to hydrolyze GTP to GDP. In contrast, guanine nucleotide releasing protein (GNRP) binds 5 to GDP-ras and induces the release of GDP and the binding of GTP.
Other regulators of G-protein signaling (RGS) also exist that act primarily by negatively regulating the G-protein pathway by an unknown mechanism (Druey, KM. et al. (1996) Nature 379:742-746). Some 15 members of the RGS family have been identified. RGS family members are related structuraHy through similarities in an approximately 120 amino acid region termed the RGS o domain and functionaUy by their abiHty to inhibit the interleukin (cytokine) induction of MAP kinase in cultured mammaHan 293T ceHs (Druey, supra). Calcium Signaling Molecules
Ca+2 is another second messenger molecule that is even more widely used as an intraceUular mediator than cAMP. Two pathways exist by which Ca+2 can enter the cytosol in response to 5 extraceUular signals: One pathway acts primarily in nerve signal transduction where Ca+2 enters a nerve terminal through a voltage-gated Ca+2 channel. The second is a more ubiquitous pathway in which Ca+2 is released from the ER into the cytosol in response to binding of an extraceUular signaling molecule to a receptor. Ca2+ directly activates regulatory enzymes, such as protein kinase C, which trigger signal transduction pathways. Ca2+ also binds to specific Ca +-binding proteins (CBPs) such as o calmoduHn (CaM) which then activate multiple target proteins in the ceU including enzymes, membrane transport pumps, and ion channels. CaM interactions are involved in a multitude of ceUular processes including, but not limited to, gene regulation, DNA synthesis, ceU cycle progression, mitosis, cytokinesis, cytoskeletal organization, muscle contraction, signal transduction, ion homeostasis, exocytosis, and metaboHc regulation (CeHo, M.R. et al. (1996) Guidebook to Calcium-binding Proteins, Oxford University Fress, Oxford, UK, pp. 15-2U). Some ( Jfs can serve as a storage depot tor Ca" in an inactive state. Calsequestrin is one such CBP that is expressed in isoforms specific to cardiac muscle and skeletal muscle. It is suggested that calsequestrin binds Ca2+ in a rapidly exchangeable state that is released during Ca2+ -signaling conditions (CeHo, M.R. et al. (1996) Guidebook to 5 Calcium-binding Proteins, Oxford University Press, New York NY, pp. 222-224). Cyclins
CeH division is the fundamental process by which aH Hving things grow and reproduce. In most organisms, the ceU cycle consists of three principle steps; interphase, mitosis, and cytokinesis. Interphase, involves preparations for ceU division, repHcation of the DNA and production of essential 0 proteins. In mitosis, the nuclear material is divided and separates to opposite sides of the ceU. Cytokinesis is the final division and fission of the ceU cytoplasm to produce the daughter ceUs.
The entry and exit of a ceU from mitosis is regulated by the synthesis and destruction of a family of activating proteins caUed cyclins. Cyclins act by binding to and activating a group of cyclin-dependent protein kinases (Cdks) which then phosphorylate and activate selected proteins 5 involved in the mitotic process. Several types of cycHns exist. (Ciechanover, A. (1994) CeU
79:13-21.) Two principle types are mitotic cyclin, or cyclin B, which controls entry of the ceU into mitosis, and Gl cyclin, which controls events that drive the ceU out of mitosis. Signal Complex Scaffolding Proteins
Ceretain proteins in intraceUular signaling pathways serve to link or cluster other proteins o involved in the signaling cascade. A conserved protein domain caUed the PDZ domain has been identified in various membrane-associated signaling proteins. This domain has been impHcated in receptor and ion channel clustering and in the targeting of multiprotein signaling complexes to speciaHzed functional regions of the cytosoHc face of the plasma membrane. (For a review of PDZ domain-containing proteins, see Ponting, C.P. et al. (1997) Bioessays 19:469-479.) A large proportion 5 of PDZ domains are found in the eukaryotic MAGUK (membrane-associated guanylate kinase) protein family, members of which bind to the intraceUular domains of receptors and channels. However, PDZ domains are also found in diverse membrane-locaHzed proteins such as protein tyrosine phosphatases, serine/threonine kinases, G-protein cofactors, and synapse-associated proteins such as syntrophins and neuronal nitric oxide synthase (nNOS). GeneraUy, about one to three PDZ o domains are found in a given protein, although up to nine PDZ domains have been identified in a single protein.
Membrane Transport Molecules
The plasma membrane acts as a barrier to most molecules. Transport between the cytoplasm 5 and the extraceUular environment, and between the cytoplasm and lumenal spaces of ceUular organeUes requires specific transport proteins. Each transport protein carries a particular class of molecule, such as ions, sugars, or amino acids, and often is specific to a certain molecular species of the class. A variety of human inherited diseases are caused by a mutation in a transport protein. For example, cystinuria is an inherited disease that results from the inabiHty to transport cystine, the disulfide-linked dimer of cysteine, from the urine into the blood. Accumulation of cystine in the urine leads to the formation of cystine stones in the kidneys.
Transport proteins are multi-pass transmembrane proteins, which either actively transport molecules across the membrane or passively aHow them to cross. Active transport involves directional pumping of a solute across the membrane, usuaUy against an electrochemical gradient. Active transport is tightly coupled to a source of metaboHc energy, such as ATP hydrolysis or an electrochemicaUy favorable ion gradient. Passive transport involves the movement of a solute down its electrochemical gradient. Transport proteins can be further classified as either carrier proteins or channel proteins. Carrier proteins, which can function in active or passive transport, bind to a specific solute to be transported and undergo a conformational change which transfers the bound solute across the membrane. Channel proteins, which only function in passive transport, form hydrophiHc pores across the membrane. When the pores open, specific solutes, such as inorganic ions, pass through the membrane and down the electrochemical gradient of the solute.
Carrier proteins which transport a single solute from one side of the membrane to the other are caUed uniporters. In contrast, coupled transporters link the transfer of one solute with simultaneous or sequential transfer of a second solute, either in the same direction (symport) or in the opposite direction (antiport). For example, intestinal and kidney epitheHum contains a variety of symporter systems driven by the sodium gradient that exists across the plasma membrane. Sodium moves into the ceU down its electrochemical gradient and brings the solute into the ceU with it. The sodium gradient that provides the driving force for solute uptake is maintained by the ubiquitous Na+/K+ ATPase. Sodium-coupled transporters include the mammaHan glucose transporter (SGLTl), iodide transporter (NIS), and multivitamin transporter (SMVT). AH three transporters have twelve putative transmembrane segments, extraceUular glycosylation sites, and cytoplasmicaHy-oriented N- and C-termini. NIS plays a crucial role in the evaluation, diagnosis, and treatment of various thyroid pathologies because it is the molecular basis for radioiodide thyroid-imaging techniques and for specific targeting of radioisotopes to the thyroid gland (Levy, O. et al. (1997) Proc. Natl. Acad. Sci.
USA 94:5568-5573). SMVT is expressed in the intestinal mucosa, kidney, and placenta, and is impHcated in the transport of the water-soluble vitamins, e.g., biotin and pantothenate (Prasad, P.D. et al. (1998) J. Biol. Chem. 273:7501-7506).
Transporters play a major role in the regulation of pH, excretion of drugs, and the ceUular K7Na+ balance. Monocarboxylate anion transporters are proton-coupled symporters with a broad substrate specificity that includes L-lactate, pyruvate, and the ketone bodies acetate, acetoacetate, and beta-hydroxybutyrate. At least seven isoforms have been identified to date. The isoforms are predicted to have twelve transmembrane (TM) heHcal domains with a large intraceUular loop between TM6 and TM7, and play a critical role in mamtaining intraceUular pH by removing the protons that are produced stoichiometricaUy with lactate during glycolysis. The best characterized
H(+)-monocarboxylate transporter is that of the erythrocyte membrane, which transports L-lactate and a wide range of other aHphatic monocarboxylates. Other ceUs possess H(+)-linked monocarboxylate transporters with differing substrate and inhibitor selectivities. In particular, cardiac muscle and tumor ceUs have transporters that differ in their K^ values for certain substrates, including stereoselectivity for L- over D-lactate, and in their sensitivity to inhibitors. There are
Na(+)-monocarboxylate cotransporters on the luminal surface of intestinal and kidney epitheHa, which aUow the uptake of lactate, pyruvate, and ketone bodies in these tissues. In addition, there are specific and selective transporters for organic cations and organic anions in organs including the kidney, intestine and Hver. Organic anion transporters are selective for hydrophobic, charged molecules with electron-attracting side groups. Organic cation transporters, such as the ammonium transporter, mediate the secretion of a variety of drugs and endogenous metaboHtes, and contribute to the maintenance of interceUular pH. (Poole, R.C. and A.P. Halestrap (1993) Am. J. Physiol. 264:C761-C782; Price, N.T. et al. (1998) Biochem. J. 329:321-328; and MartineUe, K. and I. Haggstrom (1993) J. Biotechnol. 30: 339-350.) The largest and most diverse family of transport proteins known is the ATP-binding cassette
(ABC) transporters. As a family, ABC transporters can transport substances that differ markedly in chemical structure and size, ranging from smaU molecules such as ions, sugars, amino acids, peptides, and phosphoHpids, to Hpopeptides, large proteins, and complex hydrophobic drugs. ABC proteins consist of four modules: two nucleotide-binding domains (NBD), which hydrolyze ATP to supply the energy required for transport, and two membrane-spanning domains (MSD), each containing six putative transmembrane segments. These four modules may be encoded by a single gene, as is the case for the cystic fibrosis transmembrane regulator (CFTR), or by separate genes. When encoded by separate genes, each gene product contains a single NBD and MSD. These 'half-molecules" form homo- and heterodimers, such as Tapl and Tap2, the endoplasmic reticulum-based major histocompatibiHty (MHC) peptide transport system. Several genetic diseases are attributed to defects in ABC transporters, such as the foUowing diseases and their corresponding proteins: cystic fibrosis (CFTR, an ion channel), adrenoleukodystrophy (adrenoleukodystrophy protein, ALDP), ZeUweger syndrome (peroxisomal membrane protein-70, PMP70), and hypermsulinemic hypoglycemia (sulfonylurea receptor, SUR). Overexpression of the multidrug resistance (MDR) protein, another ABC transporter, in human cancer ceUs makes the ceHs resistant to a variety of cytotoxic drugs used in chemotherapy (TagHght, D. and S. MichaeHs (1998) Meth. Enzymol. 292:131-163).
Transport of fatty acids across the plasma membrane can occur by diffusion, a high capacity, low affinity process. However, under normal physiological conditions a significant fraction of fatty acid transport appears to occur via a high affinity, low capacity protein-mediated transport process. Fatty acid transport protein (FATP), an integral membrane protein with four transmembrane segments, is expressed in tissues exhibiting high levels of plasma membrane fatty acid flux, such as muscle, heart, and adipose. Expression of FATP is upregulated in 3T3-L1 ceUs during adipose conversion, and expression in COS7 fibroblasts elevates uptake of long-chain fatty acids (Hui, T.Y. et al. (1998) J. Biol. Chem. 273:27420-27429). Ion Channels
The electrical potential of a ceU is generated and maintained by controlling the movement of ions across the plasma membrane. The movement of ions requires ion channels, which form an ion- selective pore within the membrane. There are two basic types of ion channels, ion transporters and gated ion channels. Ion transporters utiHze the energy obtained from ATP hydrolysis to actively transport an ion against the ion's concentration gradient. Gated ion channels aUow passive flow of an ion down the ion's electrochemical gradient under restricted conditions. Together, these types of ion channels generate, maintain, and utiHze an electrochemical gradient that is used in 1) electrical impulse conduction down the axon of a nerve ceH, 2) transport of molecules into ceUs against concentration gradients, 3) initiation of muscle contraction, and 4) endocrine ceU secretion. Ion transporters generate and maintain the resting electrical potential of a ceU. Utilizing the energy derived from ATP hydrolysis, they transport ions against the ion's concentration gradient. These transmembrane ATPases are divided into three famiHes. The phosphorylated (P) class ion transporters, including Na+-K+ ATPase, Ca2+-ATPase, and H+- ATPase, are activated by a phosphorylation event. P-class ion transporters are responsible for maintaining resting potential distributions such that cytosoHc concentrations of Na+ and Ca2+ are low and cytosoHc concentration of K+ is high. The vacuolar (V) class of ion transporters includes H+ pumps on intraceUular organeUes, such as lysosomes and Golgi. V-class ion transporters are responsible for generating the low pH within the lumen of these organeUes that is required for function. The coupling factor (F) class consists of H+ pumps in the mitochondria. F-class ion transporters utiHze a proton gradient to generate ATP from ADP and inorganic phosphate (Pj).
The resting potential of the ceU is utiHzed in many processes involving carrier proteins and gated ion channels. Carrier proteins utiHze the resting potential to transport molecules into and out of the ceH. Amino acid and glucose transport into many ceHs is linked to sodium ion co-transport (symport) so that the movement of Na+ down an electrochemical gradient drives transport of the other molecule up a concentration gradient. Similarly, cardiac muscle Hnks transfer of Ca2+ out of the ceU with transport of Na+ into the ceU (antiport).
Ion channels share common structural and mechanistic themes. The channel consists of four or five subunits or protein monomers that are arranged like a barrel in the plasma membrane. Each subunit typicaUy consists of six potential transmembrane segments (SI, S2, S3, S4, S5, and S6). The center of the barrel forms a pore lined by α-heHces or β-strands. The side chains of the amino acid residues comprising the α-heHces or β-strands estabHsh the charge (cation or anion) selectivity of the channel. The degree of selectivity, or what specific ions are aUowed to pass through the channel, depends on the diameter of the narrowest part of the pore.
Gated ion channels control ion flow by regulating the opening and closing of pores. These channels are categorized according to the manner of regulating the gating function. MechanicaUy- gated channels open pores in response to mechanical stress, voltage-gated channels open pores in response to changes in membrane potential, and Hgand-gated channels open pores in the presence of a specific ion, nucleotide, or neurotransmitter.
Voltage-gated Na+ and K+ channels are necessary for the function of electricaUy excitable ceHs, such as nerve and muscle ceUs. Action potentials, which lead to neurotransmitter release and muscle contraction, arise from large, transient changes in the permeabiHty of the membrane to Na+ and K+ ions. Depolarization of the membrane beyond the threshold level opens voltage-gated Na+ channels. Sodium ions flow into the ceU, further depolarizing the membrane and opening more voltage-gated Na+ channels, which propagates the depolarization down the length of the ceU. Depolarization also opens voltage-gated potassium channels. Consequently, potassium ions flow outward, which ieads to repolarization of the membrane. Voltage-gated channels utiHze charged residues in the fourth transmembrane segment (S4) to sense voltage change. The open state lasts only about 1 millisecond, at which time the channel spontaneously converts into an inactive state that cannot be opened irrespective of the membrane potential. Inactivation is mediated by the channel's N-terminus, which acts as a plug that closes the pore. The transition from an inactive to a closed state requires a return to resting potential.
Voltage-gated Na+ channels are heterotrimeric complexes composed of a 260 kDa pore forming α subunit that associates with two smaUer auxiliary subunits, βl and β2. The β2 subunit is an integral membrane glycoprotein that contains an extraceUular Ig domain, and its association with α and βl subunits correlates with increased functional expression of the channel, a change in its gating properties, and an increase in whole ceU capacitance due to an increase in membrane surface area. (Isom, L.L. et al. (1995) CeU 83:433-442.)
Voltage-gated Ca2+ channels are involved in presynaptic neurotransmitter release, and heart and skeletal muscle contraction. The voltage-gated Ca2+ channels from skeletal muscle (L-type) and brain (N-type) have been purified, and though their functions differ dramaticaUy, they have similar subunit compositions. The channels are composed of three subunits. The αx subunit forms the membrane pore and voltage sensor, while the o^δ and β subunits modulate the voltage-dependence, gating properties, and the current ampHtude of the channel. These subunits are encoded by at least six αl5 one ^δ, and four β genes. A fourth subunit, γ, has been identified in skeletal muscle. (Walker, D. et al. (1998) J. Biol. Chem. 273:2361-2367; and Jay, S.D. et al. (1990) Science 248:490-492.) Chloride channels are necessary in endocrine secretion and in regulation of cytosoHc and organeUe pH. In secretory epitheHal ceUs, CI" enters the ceU across a basolateral membrane through an Na+, K+/C1* cotransporter, accumulating in the ceU above its electrochemical equiHbrium concentration. Secretion of CI " from the apical surface, in response to hormonal stimulation, leads to flow of Na+ and water into the secretory lumen. The cystic fibrosis transmembrane conductance regulator (CFTR) is a chloride channel encoded by the gene for cystic fibrosis, a common fatal genetic disorder in humans. Loss of CFTR function decreases transepitheUal water secretion and, as a result, the layers of mucus that coat the respiratory tree, pancreatic ducts, and intestine are dehydrated and difficult to clear. The resulting blockage of these sites leads to pancreatic insufficiency, "meconium ileus", and devastating "chronic obstructive pulmonary disease" (Al-Awqati, Q. et al. (1992) J. Exp. Biol. 172:245-266).
Many intraceUular organeUes contain H+- ATPase pumps that generate transmembrane pH and electrochemical differences by moving protons from the cytosol to the organeUe lumen. If the membrane of the organeUe is permeable to other ions, then the electrochemical gradient can be abrogated without affecting the pH differential. In fact, removal of the electrochemical barrier aUows more H+ to be pumped across the membrane, increasing the pH differential. CI" is the sole counterion of H+ translocation in a number of organeUes, including chromaffin granules, Golgi vesicles, lysosomes, and endosomes. Functions that require a low vacuolar pH include uptake of smaU molecules such as biogenic amines in chromaffin granules, processing of vacuolar constituents such as pro-hormones by proteolytic enzymes, and protein degradation in lysosomes (Al-Awqati, supra).
Ligand-gated channels open their pores when an extraceUular or intraceUular mediator binds to the channel. Neurotransmitter-gated channels are channels that open when a neurotransmitter binds to their extraceUular domain. These channels exist in the postsynaptic membrane of nerve or muscle ceUs. There are two types of neurotransmitter-gated channels. Sodium channels open in response to excitatory neurotransmitters, such as acetylcholine, glutamate, and serotonin. This opening causes an influx of Na+ and produces the initial locaHzed depolarization that activates the voltage-gated channels and starts the action potential. Chloride channels open in response to inhibitory neurotransmitters, such as γ-aminobutyric acid (GABA) and glycine, leading to hyperpolarization of the membrane and the subsequent generation of an action potential. Ligand-gated channels can be regulated by intraceUular second messengers. Calcium- activated K+ channels are gated by internal calcium ions. In nerve ceUs, an influx of calcium during depolarization opens K+ channels to modulate the magnitude of the action potential (Ishi, T.M. et al.
(1997) Proc. Natl. Acad. Sci. USA 94:11651-11656). CycHc nucleotide-gated (CNG) channels are gated by cytosoHc cycHc nucleotides. The best examples of these are the cAMP-gated Na+ channels 5 involved in olfaction and the cGMP-gated cation channels involved in vision. Both systems involve
Hgand-mediated activation of a G-protein coupled receptor which then alters the level of cycHc nucleotide within the ceU.
Ion channels are expressed in a number of tissues where they are impHcated in a variety of processes. CNG channels, while abundantly expressed in photoreceptor and olfactory sensory ceUs, 0 are also found in kidney, lung, pineal, retinal gangHon ceUs, testis, aorta, and brain. Calcium-activated
K+ channels maybe responsible for the vasodilatory effects of bradykinin in the kidney and for shunting excess K+ from brain capillary endotheHal ceUs into the blood. They are also impHcated in repolarizing granulocytes after agonist-stimulated depolarization (Ishi, supra). Ion channels have been the target for many drug therapies. Neurotransmitter-gated channels have been targeted in therapies 5 for treatment of insomnia, anxiety, depression, and schizophrenia. Voltage-gated channels have been targeted in therapies for arrhythmia, ischemic stroke, head trauma, and neurodegenerative disease
(Taylor, C.P. and L.S. Narasimhan (1997) Adv. Pharmacol. 39:47-98).
Disease Correlation
The etiology of numerous human diseases and disorders can be attributed to defects in the o transport of molecules across membranes. Defects in the trafficking of membrane-bound transporters and ion channels are associated with several disorders, e.g. cystic fibrosis, glucose-galactose malabsorption syndrome, hypercholesterolemia, von Gierke disease, and certain forms of diabetes meUitus. Single-gene defect diseases resulting in an inability to transport small molecules across membranes include, e.g., cystinuria, iminoglycinuria, Hartup disease, and Fanconi disease (van't Hoff, 5 W.G. (1996) Exp. Nephrol. 4:253-262; Talente, G.M. et al. (1994) Ann. Intern. Med. 120:218-226; and ChiUon, M. et al. (1995) New Engl. J. Med. 332:1475-1480).
Protein Modification and Maintenance Molecules
The ceUular processes regulating modification and maintenance of protein molecules o coordinate their conformation, stabiHzation, and degradation. Each of these processes is mediated by key enzymes or proteins such as proteases, protease inhibitors, transferases, isomerases, and molecular chaperones.
Proteases
Proteases cleave proteins and peptides at the peptide bond that forms the backbone of the 5 peptide and protein chain. Proteolytic processing is essential to ceU growth, differentiation, remo eing, an omeos as s as we as in amma on an immune response. ypica pro em a - ves range from hours to a few days, so that within aH Hving ceHs, precursor proteins are being cleaved to their active form, signal sequences proteolyticaHy removed from targeted proteins, and aged or defective proteins degraded by proteolysis. Proteases function in bacterial, parasitic, and viral invasion and repHcation within a host. Four principal categories of mammaHan proteases have been identified based on active site structure, mechanism of action, and overaU three-dimensional structure. (Beynon, RJ. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach, Oxford University Press, New York NY, pp. 1-5).
The serine proteases (SPs) have a serine residue, usuaUy within a conserved sequence, in an active site composed of the serine, an aspartate, and a histidine residue. SPs include the digestive enzymes trypsin and chymotrypsin, components of the complement cascade and the blood-clotting cascade, and enzymes that control extraceUular protein degradation. The main SP sub-famiHes are trypases, which cleave after arginine or lysine; aspartases, which cleave after aspartate; chymases, which cleave after phenylalanine or leucine; metases, which cleavage after methionine; and serases wliich cleave after serine. Enterokinase, the initiator of intestinal digestion, is a serine protease found in the intestinal brush border, where it cleaves the acidic propeptide from trypsinogen to yield active trypsin (Kitamoto, Y. et al. (1994) Proc. Natl. Acad. Sci. USA 91:7588-7592). Prolylcarboxypeptidase, a lysosomal serine peptidase that cleaves peptides such as angiotensin H and HI and [des-Arg9] bradykinin, shares sequence homology with members of both the serine carboxypeptidase and prolylendopeptidase famiHes (Tan, F. et al. (1993) J. Biol. Chem. 268:16631- 16638).
Cysteine proteases (CPs) have a cysteine as the major catalytic residue at an active site where catalysis proceeds via an intermediate thiol ester and is facilitated by adjacent histidine and aspartic acid residues. CPs are involved in diverse ceUular processes ranging from the processing of precursor proteins to intraceUular degradation. MammaHan CPs include lysosomal cathepsins and cytosoHc calcium activated proteases, calpains. CPs are produced by monocytes, macrophages and other ceUs of the immune system which migrate to sites of inflammation and secrete molecules involved in tissue repair. Overabundance of these repair molecules plays a role in certain disorders. In autoimmune diseases such as rheumatoid arthritis, secretion of the cysteine peptidase cathepsin C degrades coUagen, laminin, elastin and other structural proteins found in the extraceUular matrix of bones.
Aspartic proteases are members of the cathepsin family of lysosomal proteases and include pepsin A, gastricsin, chymosin, renin, and cathepsins D and E. Aspartic proteases have a pair of aspartic acid residues in the active site, and are most active in the pH 2 - 3 range, in which one of the aspartate residues is ionized, the other un-ionized. Aspartic proteases include bacterial peniciUopepsin, mammaHan pepsin, renin, chymosin, and certain fungal proteases. Abnormal regulation and expression of cathepsins is evident in various inflammatory disease states. In ceUs isolated from inflamed synovia, the mRNA for stromelysin, cytokines, TTMP-1, cathepsin, gelatinase, and other molecules is preferentiaUy expressed. Expression of cathepsins L and D is elevated in synovial tissues from patients with rheumatoid arthritis and osteoarthritis. Cathepsin L expression may also contribute to the influx of mononuclear ceHs which exacerbates the destruction of the rheumatoid synovium. (Keyszer, G.M. (1995) Arthritis Rheum. 38:976-984.) The increased expression and differential regulation of the cathepsins are linked to the metastatic potential of a variety of cancers and as such are of therapeutic and prognostic interest (Chambers, A.F. et al. (1993) Crit. Rev. Oncog. 4:95-114). MetaUoproteases have active sites that include two glutamic acid residues and one histidine residue that serve as binding sites for zinc. Carboxypeptidases A and B are the principal mammaHan metaUoproteases. Both are exoproteases of similar structure and active sites. Carboxypeptidase A, like chymotrypsin, prefers C-terminal aromatic and aHphatic side chains of hydrophobic nature, whereas carboxypeptidase B is directed toward basic arginine and lysine residues. Glycoprotease (GCP), or O-sialoglycoprotein endopeptidase, is a metaUopeptidase which specificaHy cleaves O-sialoglycoproteins such as glycophorin A. Another metaUopeptidase, placental leucine aminopeptidase (P-LAP) degrades several peptide hormones such as oxytocin and vasopressin, suggesting a role in maintaining homeostasis during pregnancy, and is expressed in several tissues (Rogi, T. et al. (1996) J. Biol. Chem. 271:56-61). Ubiquitin proteases are associated with the ubiquitin conjugation system (UCS), a major pathway for the degradation of ceUular proteins in eukaryotic ceUs and some bacteria. The UCS mediates the elimination of abnormal proteins and regulates the half-Hves of important regulatory proteins that control ceUular processes such as gene transcription and ceU cycle progression. In the UCS pathway, proteins targeted for degradation are conjugated to a ubiquitin, a smaHheat stable protein. The ubiquitinated protein is then recognized and degraded by proteasome, a large, multisubunit proteolytic enzyme complex, and ubiquitin is released for reutiUzation by ubiquitin protease. The UCS is impHcated in the degradation of mitotic cycHc kinases, oncoproteins, tumor suppressor genes such as p53, viral proteins, ceH surface receptors associated with signal transduction, transcriptional regulators, and mutated or damaged proteins (Ciechanover, A. (1994) CeU 79:13-21). A murine proto-oncogene, Unp, encodes a nuclear ubiquitin protease whose overexpression leads to oncogenic transformation of NTH3T3 ceUs, and the human homolog of this gene is consistently elevated in smaU ceU tumors and adenocarcinomas of the lung (Gray, D.A. (1995) Oncogene 10:2179- 2183). Signal Peptidases The mechanism for the translocation process into the endoplasmic reticulum (ER) involves the recognition of an N-terminal signal peptide on the elongating protein. The signal peptide directs the protein and attached ribosome to a receptor on the ER membrane. The polypeptide chain passes through a pore in the ER membrane into the lumen while the N-terminal signal peptide remains attached at the membrane surface. The process is completed when signal peptidase located inside the ER cleaves the signal peptide from the protein and releases the protein into the lumen. Protease Inhibitors
Protease inhibitors and other regulators of protease activity control the activity and effects of proteases. Protease inhibitors have been shown to control pathogenesis in animal models of proteolytic disorders (Murphy, G. (1991) Agents Actions Suppl. 35:69-76). Low levels of the cystatins, low molecular weight inhibitors of the cysteine proteases, correlate with maHgnant progression of tumors. (Calkins, C. et al (1995) Biol. Biochem. Hoppe Seyler 376:71-80). Serpins are inhibitors of mammaHan plasma serine proteases. Many serpins serve to regulate the blood clotting cascade and/or the complement cascade in mammals. Sp32 is a positive regulator of the mammaHan acrosomal protease, acrosin, that binds the proenzyme, proacrosin, and thereby aides in packaging the enzyme into the acrosomal matrix (Baba, T. et al. (1994) J. Biol. Chem. 269:10133-10140). The Kunitz family of serine protease inhibitors are characterized by one or more "Kunitz domains" containing a series of cysteine residues that are regularly spaced over approximately 50 amino acid residues and form three intrachain disulfide bonds. Members of this family include aprotinin, tissue factor pathway inhibitor (TFPI-1 and TFPI-2), inter- -trypsin inhibitor, and bikunin. (Marlor, C.W. et al. (1997) J. Biol. Chem. 272:12202-12208.) Members of this family are potent inhibitors (in the nanomolar range) against serine proteases such as kaUikrein and plasmin. Aprotinin has clinical utiHty in reduction of perioperative blood loss.
A major portion of aU proteins synthesized in eukaryotic ceUs are synthesized on the cytosoHc surface of the endoplasmic reticulum (ER). Before these immature proteins are distributed to other organeUes in the ceU or are secreted, they must be transported into the interior lumen of the ER where post-translational modifications are performed. These modifications include protein folding and the formation of disulfide bonds, and N-Hnked glycosylations. Protein Isomerases
Protein folding in the ER is aided by two principal types of protein isomerases, protein disulfide isomerase (PDI), and peptidyl-prolyl isomerase (PPI). PDI catalyzes the oxidation of free sulfhydryl groups in cysteine residues to form intramolecular disulfide bonds in proteins. PPI, an enzyme that catalyzes the isomerization of certain proline imidic bonds in oHgopeptides and proteins, is considered to govern one of the rate limiting steps in the folding of many proteins to their final functional conformation. The cyclophiHns represent a major class of PPI that was originaUy identified as the major recep or or t e immunosuppressive rug cyc ospor n an sc umac er, . . e a .
Science 226: 544-547). Protein Glycosylation
The glycosylation of most soluble secreted and membrane-bound proteins by oHgosaccharides 5 linked to asparagine residues in proteins is also performed in the ER. This reaction is catalyzed by a membrane-bound enzyme, oHgosaccharyl transferase. Although the exact purpose of this "N-linked" glycosylation is unknown, the presence of oHgosaccharides tends to make a glycoprotein resistant to protease digestion. In addition, oHgosaccharides attached to ceU-surface proteins caUed selectins are known to function in ceU-ceU adhesion processes (Alberts, B. et al. (1994) Molecular Biology of the 0 CeU, Garland PubHshing Co., New York NY, p.608). "O-Hnked" glycosylation of proteins also occurs in the ER by the addition of N-acetylgalactosamine to the hydroxyl group of a serine or threonine residue foUowed by the sequential addition of other sugar residues to the first. This process is catalysed by a series of glycosyltransferases each specific for a particular donor sugar nucleotide and acceptor molecule (Lodish, H. et al. (1995) Molecular CeH Biology, W.H. Freeman and Co., New 5 York NY, ρρJOO-708). In many cases, both N- and O-Hnked oHgosaccharides appear to be required for the secretion of proteins or the movement of plasma membrane glycoproteins to the ceU surface.
An additional glycosylation mechanism operates in the ER specificaUy to target lysosomal enzymes to lysosomes and prevent their secretion. Lysosomal enzymes in the ER receive an N-linked oHgosaccharide, like plasma membrane and secreted proteins, but are then phosphorylated on one or o two mannose residues. The phosphorylation of mannose residues occurs in two steps, the first step being the addition of an N-acetylglucosamine phosphate residue by N-acetylglucosamine phosphotransferase, and the second the removal of the N-acetylglucosamine group by phosphodiesterase. The phosphorylated mannose residue then targets the lysosomal enzyme to a mannose 6-phosphate receptor which transports it to a lysosome vesicle (Lodish, supra, pp. 708-7 il). 5 Chaperones
Molecular chaperones are proteins that aid in the proper folding of immature proteins and refolding of improperly folded ones, the assembly of protein subunits, and in the transport of unfolded proteins across membranes. Chaperones are also caUed heat-shock proteins (hsp) because of their tendency to be expressed in dramaticaUy increased amounts foUowing brief exposure of ceHs to o elevated temperatures. This latter property most likely reflects their need in the refolding of proteins that have become denatured by the high temperatures. Chaperones may be divided into several classes according to their location, function, and molecular weight, and include hsp60, TCP1, hsp70, hsp40 (also caUed Dnaj), and hsp90. For example, hsp90 binds to steroid hormone receptors, represses transcription in the absence of the Hgand, and provides proper folding of the Hgand-binding 5 domain of the receptor in the presence of the hormone (Burston, S.G. and A.R. Clarke (1995) Essays Biochem. 29:125-136). Hsp60 andhsp70 chaperones aid in the transport and folding of newly synthesized proteins. Hsp70 acts early in protein folding, binding a newly synthesized protein before it leaves the ribosome and transporting the protein to the mitochondria or ER before releasing the folded protein. Hsp60, along with hsplO, binds misfolded proteins and gives them the opportunity to refold 5 correctly. AU chaperones share an affinity for hydrophobic patches on incompletely folded proteins and the abiHty to hydrolyze ATP. The energy of ATP hydrolysis is used to release the hsp-bound protein in its properly folded state (Alberts, supra, pp 214, 571-572).
Nucleic Acid Synthesis and Modification Molecules o Polymerases
DNA and RNA repHcation are critical processes for ceU repHcation and function. DNA and RNA repHcation are mediated by the enzymes DNA and RNA polymerase, respectively, by a "templating" process in which the nucleotide sequence of a DNA or RNA strand is copied by complementary base-pairing into a complementary nucleic acid sequence of either DNA or RNA. 5 However, there are fundamental differences between the two processes.
DNA polymerase catalyzes the stepwise addition of a deoxyribonucleotide to the 3'-OH end of a polynucleotide strand (the primer strand) that is paired to a second (template) strand. The new DNA strand therefore grows in the 5' to 3' direction (Alberts, B. et al. (1994) The Molecular Biology of the CeU, Garland PubHshing Inc., New York NY, pp. 251-254). The substrates for the 0 polymerization reaction are the corresponding deoxynucleotide triphosphates which must base-pair with the correct nucleotide on the template strand in order to be recognized by the polymerase. Because DNA exists as a double-stranded heHx, each of the two strands may serve as a template for the formation of a new complementary strand. Each of the two daughter ceUs of the dividing ceU therefore inherits a new DNA double heHx containing one old and one new strand. Thus, DNA is said 5 to be repHcated "semiconservatively" by DNA polymerase. In addition to the synthesis of new DNA, DNA polymerase is also involved in the repair of damaged DNA as discussed below under "Ligases."
In contrast to DNA polymerase, RNA polymerase uses a DNA template strand to "transcribe" DNA into RNA using ribonucleotide triphosphates as substrates. Like DNA polymerization, RNA polymerization proceeds in a 5' to 3' direction by addition of a ribonucleoside o monophosphate to the 3 '-OH end of a growing RNA chain. DNA transcription generates messenger RNAs (mRNA) that carry information for protein synthesis, as weU as the transfer, ribosomal, and other RNAs that have structural or catalytic functions. In eukaryotes, three discrete RNA polymerases synthesize the three different types of RNA (Alberts, supra, pp. 367-368). RNA polymerase I makes the large ribosomal RNAs, RNA polymerase H makes the mRNAs that wiU be 5 translated into proteins, and RNA polymerase HI makes a variety of smaU, stable RNAs, including 5S ribosomal RNA and the transfer RNAs (tRNA). In all cases, RNA synthesis is initiated by binding of the RNA polymerase to a promoter region on the DNA and synthesis begins at a start site within the promoter. Synthesis is completed at a broad, general stop or termination region in the DNA where both the polymerase and the completed RNA chain are released. Ligases
DNA repair is the process by which accidental base changes, such as those produced by oxidative damage, hydrolytic attack, or uncontroHed methylation of DNA are corrected before repHcation or transcription of the DNA can occur. Because of the efficiency of the DNA repair process, fewer than one in one thousand accidental base changes causes a mutation (Alberts, supra, pp. 245-249). The three steps common to most types of DNA repair are (1) excision of the damaged or altered base or nucleotide by DNA nucleases, leaving a gap; (2) insertion of the correct nucleotide in this gap by DNA polymerase using the complementary strand as the template; and (3) sealing the break left between the inserted nucleotide(s) and the existing DNA strand by DNA Hgase. In the last reaction, DNA Hgase uses the energy from ATP hydrolysis to activate the 5' end of the broken phosphodiester bond before forming the new bond with the 3'-OH of the DNA strand. In Bloom's syndrome, an inherited human disease, individuals are partiaUy deficient in DNA Hgation and consequently have an increased incidence of cancer (Alberts, supra, p. 247). Nucleases
Nucleases comprise both enzymes that hydrolyze DNA (DNase) and RNA (RNase). They serve different purposes in nucleic acid metaboHsm. Nucleases hydrolyze the phosphodiester bonds between adjacent nucleotides either at internal positions (endonucleases) or at the terminal 3 ' or 5' nucleotide positions (exonucleases). A DNA exonuclease activity in DNA polymerase, for example, serves to remove improperly paired nucleotides attached to the 3'-OH end of the growing DNA strand by the polymerase and thereby serves a "proofreading" function. As mentioned above, DNA endonuclease activity is involved in the excision step of the DNA repair process.
RNases also serve a variety of functions. For example, RNase P is a ribonucleoprotein enzyme which cleaves the 5' end of pre-tRNAs as part of their maturation process. RNase H digests the RNA strand of an RNA/DNA hybrid. Such hybrids occur in ceUs invaded by retroviruses, and RNase H is an important enzyme in the retroviral repHcation cycle. Pancreatic RNase secreted by the pancreas into the intestine hydrolyzes RNA present in ingested foods. RNase activity in serum and ceH extracts is elevated in a variety of cancers and infectious diseases (Schein, CH. (1997) Nat. Biotechnol. 15:529-536). Regulation of RNase activity is being investigated as a means to control tumor angiogenesis, aUergic reactions, viral infection and repHcation, and fungal infections. Methylases Methylation of specific nucleotides occurs in both DNA and RNA, and serves different functions in the two macromolecules. Methylation of cytosine residues to form 5-methyl cytosine in
DNA occurs specificaUy at CG sequences which are base-paired with one another in the DNA double-heHx. This pattern of methylation is passed from generation to generation during DNA repHcation by an enzyme caUed "maintenance methylase" that acts preferentially on those CG sequences that are base-paired with a CG sequence that is akeady methylated. Such methylation appears to distinguish active from inactive genes by preventing the binding of regulatory proteins that "turn on" the gene, but permit the binding of proteins that inactivate the gene (Alberts, supra, pp. 448- 451). In RNA metaboHsm, "tRNA methylase" produces one of several nucleotide modifications in tRNA that affect the conformation and base-pairing of the molecule and faciHtate the recognition of the appropriate mRNA codons by specific tRNAs. The primary methylation pattern is the dimethylation of guanine residues to form N,N-dimethyl guanine. HeHcases and Single-Stranded Binding Proteins
HeHcases are enzymes that destabilize and unwind double heHx structures in both DNA and RNA. Since DNA repHcation occurs more or less simultaneously on both strands, the two strands must first separate to generate a repHcation "fork" for DNA polymerase to act on. Two types of repHcation proteins contribute to this process, DNA heHcases and single-stranded binding proteins. DNA heHcases hydrolyze ATP and use the energy of hydrolysis to separate the DNA strands. Single-stranded binding proteins (SSBs) then bind to the exposed DNA strands without covering the bases, thereby temporarily stabilizing them for templating by the DNA polymerase (Alberts, supra, pp. 255-256).
RNA heHcases also alter and regulate RNA conformation and secondary structure. Like the DNA heHcases, RNA heHcases utiHze energy derived from ATP hydrolysis to destabiHze and unwind RNA duplexes. The most well-characterized and ubiquitous family of RNA heHcases is the DEAD- box family, so named for the conserved B-type ATP-binding motif which is diagnostic of proteins in this family. Over 40 DEAD-box heHcases have been identified in organisms as diverse as bacteria, insects, yeast, amphibians, mammals, and plants. DEAD-box heHcases function in diverse processes such as translation initiation, spHcing, ribosome assembly, and RNA editing, transport, and stability. Some DEAD-box heHcases play tissue- and stage-specific roles in spermatogenesis and embryogenesis. Overexpression of the DEAD-box 1 protein (DDX1) may play a role in the progression of neuroblastoma (Nb) and retinoblastoma (Rb) tumors (Godbout, R. et al. (1998) J. Biol. Chem. 273:21161-21168). These observations suggest that DDX1 may promote or enhance tumor progression by altering the normal secondary structure and expression levels of RNA in cancer ceUs. Other DEAD-box heHcases have been impHcated either directly or indirectly in tumorigenesis (Discussed in Godbout, supra). For example, murine p68 is mutated in ultraviolet Hght-induced tumors, and human DDX6 is located at a chromosomal breakpoint associated with B-ceU lymphoma. , Similarly, a chimeric protein comprised of DDX10 and NUP98 , a nucleoporin protein, may be involved in the pathogenesis of certain myeloid maHgnancies. Topoisomerases
Besides the need to separate DNA strands prior to repHcation, the two strands must be 5 "unwound" from one another prior to their separation by DNA heHcases. This function is performed by proteins known as DNA topoisomerases. DNA topoisomerase effectively acts as a reversible nuclease that hydrolyzes a phosphodiesterase bond in a DNA strand, permitting the two strands to rotate freely about one another to remove the strain of the heHx, and then rejoins the original phosphodiester bond between the two strands. Two types of DNA topoisomerase exist, types I and 0 H. DNA Topoisomerase I causes a single-strand break in a DNA heHx to aUow the rotation of the two strands of the heHx about the remaining phosphodiester bond in the opposite strand. DNA topoisomerase H causes a transient break in both strands of a DNA heHx where two double heHces cross over one another. This type of topoisomerase can efficiently separate two interlocked DNA circles (Alberts, supra, pp.260-262). Type H topoisomerases are largely confined to proHferating ceHs 5 in eukaryotes, such as cancer ceUs. For this reason they are targets for anticancer drugs.
Topoisomerase H has been impHcated in multi-drug resistance (MDR) as it appears to aid in the repair of DNA damage inflicted by DNA binding agents such as doxorubicin and vincristine. Recombinases
Genetic recombination is the process of rearranging DNA sequences within an organism's o genome to provide genetic variation for the organism in response to changes in the environment. DNA recombination aUows variation in the particular combination of genes present in an individual's genome, as weU as the timing and level of expression of these genes (see Alberts, supra, pp. 263-273). Two broad classes of genetic recombination are commonly recognized, general recombination and site-specific recombination. General recombination involves genetic exchange between any 5 homologous pair of DNA sequences usuaUy located on two copies of the same chromosome. The process is aided by enzymes caUed recombinases that "nick" one strand of a DNA duplex more or less randomly and permit exchange with the complementary strand of another duplex. The process does not normaUy change the arrangement of genes on a chromosome. In site-specific recombination, the recombinase recognizes specific nucleotide sequences present in one or both of the recombining o molecules. Base-pairing is not involved in this form of recombination and therefore does not require
DNA homology between the recombining molecules. Unlike general recombination, this form of recombination can alter the relative positions of nucleotide sequences in chromosomes. SpHcing Factors
Various proteins are necessary for processing of transcribed RNAs in the nucleus. Pre- 5 mRNA processing steps include capping at the 5' end with methylguanosine, polyadenylating the 3' end, and spHcing to remove introns. The primary RNA transcript from DNA is a faithful copy of the gene containing both exon and intron sequences, and the latter sequences must be cut out of the RNA transcript to produce an mRNA that codes for a protein. This "spHcing" of the mRNA sequence takes place in the nucleus with the aid of a large, multicomponent ribonucleoprotein complex known as a spHceosome. The spHceosomal complex is composed of five smaU nuclear ribonucleoprotein particles (snRNPs) designated UI, U2, U4, U5, and U6, and a number of additional proteins. Each snRNP contains a single species of snRNA and about ten proteins. The RNA components of some snRNPs recognize and base pair with intron consensus sequences. The protein components mediate spHceosome assembly and the spHcing reaction. Autoantibodies to snRNP proteins are found in die blood of patients with systemic lupus erythematosus (Stryer, L. (1995) Biochemistry, W.H. Freeman and Company, New York NY, p. 863).
Adhesion Molecules
The surface of a ceU is rich in transmembrane proteoglycans, glycoproteins, glycoHpids, and receptors. These macromolecules mediate adhesion with other ceUs and with components of the extraceUular matrix (ECM). The interaction of the ceU with its surroundings profoundly influences ceU shape, strength, flexibility, motiHty, and adhesion. These dynamic properties are intimately associated with signal transduction pathways controlHng ceU proHferation and differentiation, tissue construction, and embryonic development. Cadherins
Cadherins comprise a family of calcium-dependent glycoproteins that function in mediating ceU-ceU adhesion in virtuaUy aU soHd tissues of multiceUular organisms. These proteins share multiple repeats of a cadherin-specific motif, and the repeats form the folding units of the cadherin extraceUular domain. Cadherin molecules cooperate to form focal contacts, or adhesion plaques, between adjacent epitheHal ceUs. The cadherin family includes the classical cadherins and protocadherins. Classical cadherins include the E-cadherin, N-cadherin, and P-cadherin subfamilies. E-cadherin is present on many types of epitheHal ceUs and is especiaUy important for embryonic development. N-cadherin is present on nerve, muscle, and lens ceUs and is also critical for embryonic development. P-cadherin is present on ceUs of the placenta and epidermis. Recent studies report that protocadherins are involved in a variety of ceU-ceU interactions (Suzuki, S.T. (1996) J. CeU Sci.
109:2609-2611). The intraceUular anchorage of cadherins is regulated by their dynamic association with catenins, a family of cytoplasmic signal transduction proteins associated with the actin cytoskeleton. The anchorage of cadherins to the actin cytoskeleton appears to be regulated by protein tyrosine phosphorylation, and the cadherins are the target of phosphorylation-induced junctional disassembly (Aberle, H. et al. (1996) J. CeU. Biochem. 61:514-523). lntegrins
Integrins are ubiquitous transmembrane adhesion molecules that link the ECM to the internal cytoskeleton. Integrins are composed of two noncovalently associated transmembrane glycoprotein subunits called α and β. Integrins function as receptors that play a role in signal transduction. For example, binding of integrin to its extraceUular Hgand may stimulate changes in intraceUular calcium levels or protein kinase activity (Sjaastad, M.D. and W.J. Nelson (1997) BioEssays 19:47-55). At least ten ceU surface receptors of the integrin family recognize the ECM component fibronectin, which is involved in many different biological processes including ceU migration and embryogenesis (Johansson, S. et al. (1997) Front. Biosci. 2:D126-D146). Lectins
Lectins comprise a ubiquitous family of extraceUular glycoproteins which bind ceU surface carbohydrates specificaUy and reversibly, resulting in the agglutination of ceUs (reviewed in Drickamer, K. and M.E. Taylor (1993) Annu. Rev. CeU Biol. 9:237-264). This function is particularly important for activation of the immune response. Lectins mediate the agglutination and mitogenic stimulation of lymphocytes at sites of inflammation (Lasky, L.A. (1991) J. CeU. Biochem. 45:139-146; Paietta, E. et al. (1989) J. Immunol. 143:2850-2857).
Lectins are further classified into subfamilies based on carbohydrate-binding specificity and other criteria. The galectin subfamily, in particular, includes lectins that bind β-galactoside carbohydrate moieties in a thiol-dependent manner (reviewed in Hadari, Y.R. et al. (1998) J. Biol. Chem. 270:3447-3453). Galectins are widely expressed and developmentaUy regulated. Because aU galectins lack an N-terminal signal peptide, it is suggested that galectins are externaHzed through an atypical secretory mechanism. Two classes of galectins have been defined based on molecular weight and oHgomerization properties. SmaU galectins form homodimers and are about 14 to 16 kilodaltons in mass, while large galectins are monomeric and about 29-37 kilodaltons. Galectins contain a characteristic carbohydrate recognition domain (CRD). The CRD is about 140 amino acids and contains several stretches of about 1 - 10 amino acids which are highly conserved among aU galectins. A particular 6-amino acid motif within the CRD contains conserved tryptophan and arginine residues which are critical for carbohydrate binding. The CRD of some galectins also contains cysteine residues which maybe important for disulfide bond formation. Secondary structure predictions indicate that the CRD forms several β-sheets.
Galectins play a number of roles in diseases and conditions associated with ceU-ceU and ceU- matrix interactions. For example, certain galectins associate with sites of inflammation and bind to ceU surface immunoglobulin E molecules. In addition, galectins may play an important role in cancer metastasis. Galectin overexpression is correlated with the metastatic potential of cancers in humans and mice. Moreover, anti-galectin antibodies inhibit processes associated with ceU transformation, suc as ce aggregat on an anc orage- n ependent growt ( ee, or examp e, Su, .-Z. et a .
Proc. Natl. Acad. Sci. USA 93:7252-7257). Selections
Selectins, or LEC-CAMs, comprise a speciaHzed lectin subfamily involved primarily in 5 inflammation and leukocyte adhesion (Reviewed in Lasky, supra). Selectins mediate the recruitment of leukocytes from the circulation to sites of acute inflammation and are expressed on the surface of vascular endotheHal ceUs in response to cytokine signaling. Selectins bind to specific Hgands on the leukocyte ceH membrane and enable the leukocyte to adhere to and migrate along the endotheHal surface. Binding of selectin to its Hgand leads to polarized rearrangement of the actin cytoskeleton 0 and stimulates signal transduction within the leukocyte (Brenner, B. et al. (1997) Biochem. Biophys. Res. Commun. 231:802-807; Hidari, K.I. et al. (1997) J. Biol. Chem. 272:28750-28756). Members of the selectin family possess three characteristic motifs: a lectin or carbohydrate recognition domain; an epidermal growth factor-like domain; and a variable number of short consensus repeats (ser or "sushi" repeats) which are also present in complement regulatory proteins. The selectins include lymphocyte 5 adhesion molecule-1 (Lam-1 or L-selectin), endotheHal leukocyte adhesion molecule-1 (ELAM-1 or E- selectin), and granule membrane protein-140 (GMP-140 or P-selectin) (Johnston, G.I. et al. (1989) CeU 56:1033-1044).
Antigen Recognition Molecules o AU vertebrates have developed sophisticated and complex immune systems that provide protection from viral, bacterial, fungal, and parasitic infections. A key feature of the immune system is its abiHty to distinguish foreign molecules, or antigens, from "self' molecules. This abiHty is mediated primarily by secreted and transmembrane proteins expressed by leukocytes (white blood ceHs) such as lymphocytes, granulocytes, and monocytes. Most of these proteins belong to the immunoglobuHn (Ig) 5 superfamily, members of which contain one or more repeats of a conserved structural domain. This
Ig domain is comprised of antiparaUel β sheets joined by a disulfide bond in an arrangement caUed the Ig fold. Members of the Ig superfamily include T-ceU receptors, major histocompatibiHty (MHC) proteins, antibodies, and immune ceU-specific surface markers such as CD4, CD8, and CD28.
MHC proteins are ceU surface markers that bind to and present foreign antigens to T ceUs. o MHC molecules are classified as either class I or class H. Class I MHC molecules (MHC I) are expressed on the surface of almost aU ceUs and are involved in the presentation of antigen to cytotoxic T ceHs. For example, a ceU infected with virus wiH degrade intraceUular viral proteins and express the protein fragments bound to MHC I molecules on the ceU surface. The MHC I/antigen complex is recognized by cytotoxic T-ceUs which destroy the infected ceU and the virus within. Class π MHC 5 molecules are expressed primarily on speciaHzed antigen-presenting ceUs of the immune system, such as B-ceUs and macrophages. These ceUs ingest foreign proteins from the extraceUular fluid and express MHC H/antigen complex on the ceU surface. This complex activates helper T-ceUs, which then secrete cytokines and other factors that stimulate the immune response. MHC molecules also play an important role in organ rejection foUowing transplantation. Rejection occurs when the 5 recipient's T-ceHs respond to foreign MHC molecules on the transplanted organ in the same way as to self MHC molecules bound to foreign antigen. (Reviewed in Alberts, B. et al. (1994) Molecular Biology of the CeU. Garland PubHshing, New York NY, pp. 1229-1246.)
Antibodies, or immunoglobulins, are either expressed on the surface of B-ceUs or secreted by B-ceUs into the circulation. Antibodies bind and neutraHze foreign antigens in the blood and other 0 extraceUular fluids. The prototypical antibody is a tetramer consisting of two identical heavy polypeptide chains (H-chains) and two identical Hght polypeptide chains (L-chains) interlinked by disulfide bonds. This arrangement confers the characteristic Y-shape to antibody molecules. Antibodies are classified based on their H-chain composition. The five antibody classes, IgA, IgD, IgE, IgG and IgM, are defined by the α, δ, ε, γ, and μ H-chain types. There are two types of L-chains, 5 and λ, either of which may associate as a pair with any H-chain pair. IgG, the most common class of antibody found in the circulation, is tetrameric* while the other classes of antibodies are generaUy variants or multimers of this basic structure.
H-chains and L-chains each contain an N-terminal variable region and a C-terminal constant region. The constant region consists of about 110 amino acids in L-chains and about 330 or 440 amino o acids in H-chains. The amino acid sequence of the constant region is nearly identical among H- or L- chains of a particular class. The variable region consists of about 110 amino acids in both H- and L- chains. However, the amino acid sequence of the variable region differs among H- or L-chains of a particular class. Within each H- or L-chain variable region are three hypervariable regions of extensive sequence diversity, each consisting of about 5 to 10 amino acids. In the antibody molecule, 5 the H- and L-chain hypervariable regions come together to form the antigen recognition site.
(Reviewed in Alberts, supra, pp. 1206-1213 and 1216-1217.)
Both H-chains and L-chains contain repeated Ig domains. For example, a typical H-chain contains four Ig domains, three of which occur within the constant region and one of which occurs within the variable region and contributes to the formation of the antigen recognition site. Likewise, a o typical L-chain contains two Ig domains, one of which occurs within the constant region and one of which occurs within the variable region.
The immune system is capable of recognizing and responding to any foreign molecule that enters the body. Therefore, the immune system must be armed with a fuU repertoire of antibodies against aU potential antigens. Such antibody diversity is generated by somatic rearrangement of gene 5 segments encoding variable and constant regions. These gene segments are joined together by site- specific recombination which occurs between highly conserved DNA sequences that flank each gene segment. Because there are hundreds of different gene segments, millions of unique genes can be generated combinatoriaUy. In addition, imprecise joining of these segments and an unusuaUy high rate of somatic mutation within these segments further contribute to the generation of a diverse antibody 5 population.
T-ceU receptors are both structuraUy and functionaUy related to antibodies. (Reviewed in Alberts, supra, pp. 1228-1229.) T-ceU receptors are ceU surface proteins that bind foreign antigens and mediate diverse aspects of the immune response. A typical T-ceU receptor is a heterodimer comprised of two disulfide-linked polypeptide chains caUed α and β. Each chain is about 280 amino 0 acids in length and contains one variable region and one constant region. Each variable or constant region folds into an Ig domain. The variable regions from the α and β chains come together in the heterodimer to form the antigen recognition site. T-ceU receptor diversity is generated by somatic rearrangement of gene segments encoding die α and β chains. T-cell receptors recognize smaU peptide antigens that are expressed on the surface of antigen-presenting ceUs and pathogen-infected 5 ceHs. These peptide antigens are presented on the ceU surface in association with major histocompatibiHty proteins wliich provide the proper context for antigen recognition.
Secreted and Extracellular Matrix Molecules
Protein secretion is essential for ceUular function. Protein secretion is mediated by a signal o peptide located at the amino terminus of the protein to be secreted. The signal peptide is comprised of about ten to twenty hydrophobic amino acids which target the nascent protein from the ribosome to the endoplasmic reticulum (ER). Proteins targeted to the ER may either proceed through the secretory pathway or remain in any of the secretory organeUes such as the ER, Golgi apparatus, or lysosomes. Proteins that transit through the secretory pathway are either secreted into the 5 extraceUular space or retained in the plasma membrane. Secreted proteins are often synthesized as inactive precursors that are activated by post-translational processing events during transit through the secretory pathway. Such events include glycosylation, proteolysis, and removal of the signal peptide by a signal peptidase. Other events that may occur during protein transport include chaperone- dependent unfolding and folding of the nascent protein and interaction of the protein with a receptor or o pore complex. Examples of secreted proteins with amino terminal signal peptides include receptors, extraceUular matrix molecules, cytokines, hormones, growth and differentiation factors, neuropeptides, vasomediators, ion channels, transporters/pumps, and proteases. (Reviewed in Alberts, B. et al. (1994) Molecular Biology of The CeH. Garland PubHshing, New York NY, pp. 557-560, 582-592.) The extraceUular matrix (ECM) is a complex network of glycoproteins, polysaccharides, 5 proteoglycans, and other macromolecules that are secreted from the ceU into the extraceUular space. ine ϋ ivi remains in close association wim me cen surrace ana provides a supportive mesnworκ mat profoundly influences ceU shape, motiHty, strength, flexibility, and adhesion. In fact, adhesion of a ceU to its surrounding matrix is required for ceU survival except in the case of metastatic tumor ceUs, which have overcome the need for ceU-ECM anchorage. This phenomenon suggests that the ECM plays a critical role in the molecular mechanisms of growth control and metastasis. (Reviewed in Ruoslahti, E. (1996) Sci. Am. 275:72-77.) Furthermore, the ECM determines the structure and physical properties of connective tissue and is particularly important for morphogenesis and other processes associated with embryonic development and pattern formation.
The coUagens comprise a family of ECM proteins that provide structure to bone, teeth, skin, Hgaments, tendons, cartilage, blood vessels, and basement membranes. Multiple coUagen proteins have been identified. Three coUagen molecules fold together in a triple heHx stabiHzed by interchain disulfide bonds. Bundles of these triple heHces then associate to form fibrils. CoUagen primary structure consists of hundreds of (Gly-X-Y) repeats where about a third of the X and Y residues are Pro. Glycines are crucial to heHx formation as the bulkier amino acid sidechains cannot fold into the triple heHcal conformation. Because of these strict sequence requirements, mutations in coUagen genes have severe consequences. Osteogenesis imperfecta patients have brittle bones that fracture easily; in severe cases patients die in utero or at birth. Ehlers-Danlos syndrome patients have hyperelastic skin, hypermobile joints, and susceptibility to aortic and intestinal rupture. Chondrodysplasia patients have short stature and ocular disorders. Alport syndrome patients have hematuria, sensorineural deafness, and eye lens deformation. (Isselbacher, KJ. et al. (1994)
Harrison's Principles of Internal Medicine, McGraw-HiU, Inc., New York NY, pp. 2105-2117; and Creighton, T.E. (1984) Proteins, Structures and Molecular Principles, W.H. Freeman and Company, New York NY, pp. 191-197.)
Elastin and related proteins confer elasticity to tissues such as skin, blood vessels, and lungs. Elastin is a highly hydrophobic protein of about 750 amino acids that is rich in proline and glycine residues. Elastin molecules are highly cross-linked, forming an extensive extraceUular network of fibers and sheets. Elastin fibers are surrounded by a sheath of microfibrils which are composed of a number of glycoproteins, including fibrillin. Mutations in the gene encoding fibriUin are responsible for Marfan's syndrome, a genetic disorder characterized by defects in connective tissue. In severe cases, the aortas of afflicted individuals are prone to rupture. (Reviewed in Alberts, supra, pp. 984-986.)
Fibronectin is a large ECM glycoprotein found in aU vertebrates. Fibronectin exists as a dimer of two subunits, each containing about 2,500 amino acids. Each subunit folds into a rod-like structure containing multiple domains. The domains each contain multiple repeated modules, the most common of which is the type HI fibronectin repeat. The type HI fibronectin repeat is about 90 amino acids in length and is also found in other ECM proteins and in some plasma membrane and cytoplasmic proteins. Furthermore, some type Dl fibronectin repeats contain a characteristic tπpeptide consistmg of Arginine-Glycine- Aspartic acid (RGD). The RGD sequence is recognized by the integrin family of ceU surface receptors and is also found in other ECM proteins. Disruption of both copies of the gene encoding fibronectin causes early embryonic lethaHty in mice. The mutant embryos display extensive morphological defects, including defects in the formation of the notochord, somites, heart, blood vessels, neural tube, and extraembryonic structures. (Reviewed in Alberts, supra, pp. 986-987.)
Laminin is a major glycoprotein component of the basal lamina which underlies and supports epitheHal ceU sheets. Laminin is one of the first ECM proteins synthesized in the developing embryo. Laminin is an 850 kilodalton protein composed of three polypeptide chains joined in the shape of a cross by disulfide bonds. Laminin is especiaUy important for angiogenesis and in particular, for guiding the formation of capillaries. (Reviewed in Alberts, supra, pp. 990-991.)
There are many other types of proteinaceous ECM components, most of which can be classified as proteoglycans. Proteoglycans are composed of unbranched polysaccharide chains (glycosaminoglycans) attached to protein cores. Common proteoglycans include aggrecan, betaglycan, decorin, perlecan, serglycin, and syndecan-1. Some of these molecules not only provide mechanical support, but also bind to extraceUular signaling molecules, such as fibroblast growth factor and transforming growth factor β, suggesting a role for proteoglycans in ceU-ceU communication and ceU growth. (Reviewed in Alberts, supra, pp. 973-978.) Likewise, the glycoproteins tenascin-C and tenascin-R are expressed in developing and lesioned neural tissue and provide stimulatory and anti- adhesive (inhibitory) properties, respectively, for axonal growth. (Faissner, A. (1997) CeU Tissue Res. 290:331-341.)
Cytoskeletal Molecules
The cytoskeleton is a cytoplasmic network of protein fibers that mediate ceU shape, structure, and movement. The cytoskeleton supports the ceU membrane and forms tracks along which organeUes and other elements move in the cytosol. The cytoskeleton is a dynamic structure that aUows ceUs to adopt various shapes and to carry out directed movements. Major cytoskeletal fibers include the microtubules, the microfilaments, and the intermediate filaments. Motor proteins, including myosin, dynein, and kinesin, drive movement of or along the fibers. The motor protein dynamin drives the formation of membrane vesicles. Accessory or associated proteins modify the structure or activity of the fibers while cytoskeletal membrane anchors connect the fibers to the ceU membrane. TubuHns
Microtubules, cytoskeletal fibers with a diameter of about 24 nm, have multiple roles in the ceU. Bundles of microtubules form ciHa and flageUa, which are whip-like extensions of the ceU membrane that are necessary for sweeping materials across an epitheHum and for swimming of sperm, respectively. Marginal bands of microtubules in red blood ceUs and platelets are important for these ceUs' pHabiHty. OrganeUes, membrane vesicles, and proteins are transported in the ceU along tracks of microtubules. For example, microtubules run through nerve ceH axons, aUowing bidirectional transport of materials and membrane vesicles between the ceH body and the nerve terminal. Failure to supply the nerve terminal with these vesicles blocks the transmission of neural signals. Microtubules are also critical to chromosomal movement during ceU division. Both stable and short-Hved populations of microtubules exist in the ceU.
Microtubules are polymers of GTP-binding tubuHn protein subunits. Each subunit is a heterodimer of α- and β- tubulin, multiple isoforms of which exist. The hydrolysis of GTP is linked to the addition of tubulin subunits at tihe end of a microtubule. The subunits interact head to tail to form protofilaments; the protofilaments interact side to side to form a microtubule. A microtubule is polarized, one end ringed with α-tubulin and the other with β-tubuHn, and the two ends differ in their rates of assembly. GeneraUy, each microtubule is composed of 13 protofilaments although 11 or 15 protofilament-microtubules are sometimes found. CiHa and flageUa contain doublet microtubules. Microtubules grow from speciaHzed structures known as centrosomes or microtubule-organizing centers (MTOCs). MTOCs may contain one or two centrioles, which are pinwheel arrays of triplet microtubules. The basal body, the organizing center located at the base of a ciHum or flageUum, contains one centriole. Gamma tubuHn present in the MTOC is important for nucleating the polymerization of α- and β- tubuHn heterodimers but does not polymerize into microtubules. Microtubule- Associated Proteins
Microtubule-associated proteins (MAPs) have roles in the assembly and stabiHzation of microtubules. One major family of MAPs, assembly MAPs, can be identified in neurons as weH as non-neuronal ceUs. Assembly MAPs are responsible for cross-linking microtubules in the cytosol. These MAPs are organized into two domains: a basic microtubule-binding domain and an acidic projection domain. The projection domain is the binding site for membranes, intermediate filaments, or other microtubules. Based on sequence analysis, assembly MAPs can be further grouped into two types: Type I and Type H. Type I MAPs, which include MAPIA and MAPIB, are large, filamentous molecules that co-purify with microtubules and are abundantly expressed in brain and testes. Type I MAPs contain several repeats of a positively-charged amino acid sequence motif that binds and neufraHzes negatively charged tubuHn, leading to stabiHzation of microtubules. MAPIA and MAPIB are each derived from a single precursor polypeptide that is subsequently proteolyticaUy processed to generate one heavy chain and one Hght chain.
Another Hght chain, LC3, is a 16.4 kDa molecule that binds MAPIA, MAPIB, and microtubules. It is suggested that LC3 is synthesized from a source other than the MAPIA or MAPIB transcripts, and that the expression of LC3 may be important in regulating the microtubule binding activity of MAPIA and MAPIB during ceU proHferation (Mann, S.S. et al. (1994) J. Biol.
Chem. 269:11492-11497).
Type H MAPs, which include MAP2a, MAP2b, MAP2c, MAP4, and Tau, are characterized by three to four copies of an 18-residue sequence in the microtubule-binding domain. MAP2a, MAP2b, and MAP2c are found only in dendrites, MAP4 is found in non-neuronal ceUs, and Tau is found in axons and dendrites of nerve ceHs. Alternative spHcing of the Tau mRNA leads to the existence of multiple forms of Tau protein. Tau phosphorylation is altered in neurodegenerative disorders such as Alzheimer's disease, Pick's disease, progressive supranuclear palsy, corticobasal degeneration, and famiHal frontotemporal dementia and Parkinsonism linked to chromosome 17. The altered Tau phosphorylation leads to a coUapse of the microtubule network and the formation of intraneuronal Tau aggregates (SpiUantini, M.G. and M. Goedert (1998) Trends Neurosci. 21:428-433).
The protein pericentrin is found in the MTOC and has a role in microtubule assembly. Actins
Microfilaments, cytoskeletal filaments with a diameter of about 7-9 nm, are vital to ceU locomotion, ceU shape, ceU adhesion, ceU division, and muscle contraction. Assembly and disassembly of the microfilaments aUow ceUs to change their morphology. Microfilaments are the polymerized form of actin, the most abundant intraceUular protein in the eukaryotic ceU. Human ceUs contain six isoforms of actin. The three α-actins are found in different kinds of muscle, nonmuscle β-actin and nonmuscle γ-actin are found in nonmuscle ceUs, and another γ-actin is found in intestinal smooth muscle ceUs. G-actin, the monomeric form of actin, polymerizes into polarized, heHcal F-actin filaments, accompanied by the hydrolysis of ATP to ADP. Actin filaments associate to form bundles and networks, providing a framework to support the plasma membrane and determine ceU shape. These bundles and networks are connected to the ceU membrane. In muscle ceHs, thin filaments containing actin sHde past thick filaments containing the motor protein myosin during contraction. A family of actin-related proteins exist that are not part of the actin cytoskeleton, but rather associate with microtubules and dynein. Actin- Associated Proteins
Actin-associated proteins have roles in cross-Hnking, severing, and stabiHzation of actin filaments and in sequestering actin monomers. Several of the actin-associated proteins have multiple functions. Bundles and networks of actin filaments are held together by actin cross-Hnking proteins.
These proteins have two actin-binding sites, one for each filament. Short cross-Hnking proteins promote bundle formation while longer, more flexible cross-linking proteins promote network formation. Calmodulin-Hke calcium-binding domains in actin cross-Hnking proteins aUow calcium regulation of cross-Hnking. Group I cross-Hnking proteins have unique actin-binding domains and include the 30 kD protein, EF-la, fascin, and scruin. Group H cross-Hnking proteins have a 7,000-MW actin-binding domain and include yilHn and dematin. Group lH cross-linking protems have pairs of a
26,000-MW actin-binding domain and include fimbrin, spectrin, dystrophin, ABP 120, and filamin. Severing proteins regulate the length of actin filaments by breaking them into short pieces or by blocking their ends. Severing proteins include gCAP39, severin (fragmin), gelsolin, and viUin. Capping proteins can cap the ends of actin filaments, but cannot break filaments. Capping proteins include CapZ and tropomodulin. The proteins thymosin and profilin sequester actin monomers in the cytosol, aHowing a pool of unpolymerized actin to exist. The actin-associated proteins tropomyosin, troponin, and caldesmon regulate muscle contraction in response to calcium. Intermediate Filaments and Associated Proteins Intermediate filaments (TFs) are cytoskeletal fibers with a diameter of about 10 nm, intermediate between that of microfilaments and microtubules. IFs serve structural roles in the ceU, reinforcing ceUs and organizing ceUs into tissues. IFs are particularly abundant in epidermal ceUs and in neurons. IFs are extremely stable, and, in contrast to microfilaments and microtubules, do not function in ceU motiHty. Five types of IF proteins are known in mammals. Type I and Type H proteins are the acidic and basic keratins, respectively. Heterodimers of the acidic and basic keratins are the building blocks of keratin IFs. Keratins are abundant in soft epitheHa such as skin and cornea, hard epitheHa such as nails and hair, and in epitheHa that Hne internal body cavities. Mutations in keratin genes lead to epitheHal diseases including epidermolysis buUosa simplex, buUous congenital ichthyosiform erythroderma (epidermolytic hyperkeratosis), non-epidermolytic and epidermolytic palmoplantar keratoderma, ichthyosis buUosa of Siemens, pachyonychia congenita, and white sponge nevus. Some of these diseases result in severe skinbHstering. (See, e.g., Wawersik, M. et al. (1997) J. Biol. Chem. 272:32557-32565; and Corden L.D. and W.H. McLean (1996) Exp. Dermatol. 5:297-307.)
Type HI IF proteins include des in, gHal fibriUary acidic protein, vimentin, and peripherin. Desmin filaments in muscle ceUs link myofibrils into bundles and stabiHze sarcomeres in contractmg muscle. GHal fibriUary acidic protein filaments are found in the gHal ceUs that surround neurons and asfrocytes. Vimentin filaments are found in blood vessel endotheHal ceHs, some epitheHal ceUs, and mesenchymal ceUs such as fibroblasts, and are commonly associated with microtubules. Vimentin filaments may have roles in keeping the nucleus and other organeUes in place in the ceU. Type JN IFs include the neurofilaments and nestin. Neurofilaments, composed of three polypeptides NF-L, NF-M, and NF-H, are frequently associated with microtubules in axons. Neurofilaments are responsible for the radial growth and diameter of an axon, and ultimately for the speed of nerve impulse transmission. Changes in phosphorylation and metaboHsm of neurofilaments are observed in neurodegenerative diseases including amyotrophic lateral sclerosis, Parkinson's disease, and Alzheimer's disease (JuHen, J.P. and W.E. Mushynski (1998) Prog. Nucleic Acid Res. Mol. Biol. 61:1-23). Type V IFs, the lamins, are found in the nucleus where they support the nuclear membrane.
IFs have a central α-heHcal rod region interrupted by short nonheHcal linker segments. The rod region is bracketed, in most cases, by non-heHcal head and tail domains. The rod regions of intermediate filament proteins associate to form a coiled-coil dimer. A highly ordered assembly process leads from the dimers to the IFs. Neither ATP nor GTP is needed for IF assembly, unlike that of microfilaments and microtubules.
IF-associated proteins (IFAPs) mediate the interactions of IFs with one another and with other ceU structures. IFAPs cross-link IFs into a bundle, into a network, or to the plasma membrane, and may cross-link JFs to the microfϊlament and microtubule cytoskeleton. Microtubules and IFs are in particular closely associated. IFAPs include BPAGl, plakoglobin, desmoplakin I, desmoplakin H, plectin, ankyrin, filaggrin, and lamin B receptor. Cytoskeletal-Membrane Anchors
Cytoskeletal fibers are attached to the plasma membrane by specific proteins. These attachments are important for maintaining ceU shape and for muscle contraction. In erythrocytes, the spectrin-actin cytoskeleton is attached to ceU membrane by three proteins, band 4.1, ankyrin, and adducin. Defects in this attachment result in abnormaUy shaped ceUs which are more rapidly degraded by the spleen, leading to anemia. In platelets, the spectrin-actin cytoskeleton is also linked to the membrane by ankyrin; a second actin network is anchored to the membrane by filamin. In muscle ceUs the protein dystrophin Hnks actin filaments to the plasma membrane; mutations in the dystrophin gene lead to Duchenne muscular dystrophy. In adherens junctions and adhesion plaques the peripheral membrane proteins α-actinin and vincuHn attach actin filaments to the ceH membrane.
IFs are also attached to membranes by cytoskeletal-membrane anchors. The nuclear lamina is attached to the inner surface of the nuclear membrane by the lamin B receptor. Vimentin IFs are attached to the plasma membrane by ankyrin and plectin. Desmosome and hemidesmosome membrane junctions hold together epitheHal ceUs of organs and skin. These membrane junctions aUow shear forces to be distributed across the entire epitheHal ceU layer, thus providing strength and rigidity to the epitheHum. IFs in epitheHal ceUs are attached to the desmosome by plakoglobin and desmoplakins. The proteins that link IFs to hemidesmosomes are not known. Desmin JFs surround the sarcomere in muscle and are linked to the plasma membrane by paranemin, synemin, and ankyrin. Myosin-related Motor Proteins
Myosins are actin-activated ATPases, found in eukaryotic ceUs, that couple hydrolysis of ATP with motion. Myosin provides the motor function for muscle contraction and intraceUular movements such as phagocytosis and rearrangement of ceU contents during mitotic ceU division (cytokinesis). The contractile unit of skeletal muscle, termed the sarcomere, consists of highly ordered arrays of thin actin-containing filaments and thick myosin-containing filaments. Crossbridges form between the thick and thin filaments, and the ATP-dependent movement of myosin heads within the thick filaments puUs the thin filaments, shortening the sarcomere and thus the muscle fiber.
Myosins are composed of one or two heavy chains and associated Hght chains. Myosin heavy chains contain an amino-terminal motor or head domain, a neck that is the site of Hght-chain binding, 5 and a carboxy-terminal tail domain. The tail domains may associate to form an α-heHcal coiled coil. Conventional myosins, such as those found in muscle tissue, are composed of two myosin heavy-chain subunits, each associated with two Hght-chain subunits that bind at the neck region and play a regulatory role. Unconventional myosins, beHeved to function in intraceUular motion, may contain either one or two heavy chains and associated Hght chains. There is evidence for about 25 myosin l o heavy chain genes in vertebrates, more than half of them unconventional. Dynein-related Motor Proteins
Dyneins are (-) end-directed motor proteins which act on microtubules. Two classes of dyneins, cytosoHc and axonemal, have been identified. CytosoHc dyneins are responsible for translocation of materials along cytoplasmic microtubules, for example, transport from the nerve
15 terminal to the ceU body and transport of endocytic vesicles to lysosomes. Cytoplasmic dyneins are also reported to play a role in mitosis. Axonemal dyneins are responsible for the beating of flageUa and ciHa. Dynein on one microtubule doublet walks along the adjacent microtubule doublet. This sHding force produces bending forces that cause the flageUum or ciHum to beat. Dyneins have a native mass between 1000 and 2000 kDa and contain either two or three force-producing heads driven
20 by the hydrolysis of ATP. The heads are linked via stalks to a basal domain which is composed of a highly variable number of accessory intermediate and Hght chains. Kinesin-related Motor Proteins
Kinesins are (+) end-directed motor proteins which act on microtubules. The prototypical kinesin molecule is involved in the transport of membrane-bound vesicles and organeUes. This
25 function is particularly important for axonal transport in neurons. Kinesin is also important in aU ceU types for the transport of vesicles from the Golgi complex to the endoplasmic reticulum. This role is critical for mamtaining the identity and functionaHty of these secretory organeUes.
Kinesins define a ubiquitous, conserved family of over 50 proteins that can be classified into at least 8 subfamilies based on primary amino acid sequence, domain structure, velocity of movement,
30 and ceUular function. (Reviewed in Moore, J.D. and S.A. Endow (1996) Bioessays 18:207-219; and Hoyt, A.M. (1994) Curr. Opin. CeU Biol. 6:63-68.) The prototypical kinesin molecule is a heterotetramer comprised of two heavy polypeptide chains (KHCs) and two Hght polypeptide chains (KLCs). The KHC subunits are typicaUy referred to as "kinesin." KHC is about 1000 amino acids in length, and KLC is about 550 amino acids in length. Two KHCs dimerize to form a rod-shaped
35 molecule with three distinct regions of secondary structure. At one end of the molecule is a globular motor domam a unc ons m y ro ysis an micro u u e in ing. inesin mo or omains are
highly conserved and share over 70% identity. Beyond the motor domain is an α-heHcal coiled-coil region which mediates dimerization. At the other end of the molecule is a fan-shaped tail that associates with molecular cargo. The tail is formed by the interaction of the KHC C-termini with the two KLCs.
Members of the more divergent subfamilies of kinesins are caUed kinesin-related proteins (KRPs), many of which function during mitosis in eukaryotes (Hoyt, supra). Some KRPs are required for assembly of the mitotic spindle. In vivo and in vitro analyses suggest that these KRPs exert force on microtubules that comprise the mitotic spindle, resulting in the separation of spindle poles. Phosphorylation of KRP is required for this activity. Failure to assemble the mitotic spindle results in abortive mitosis and chromosomal aneuploidy, the latter condition being characteristic of cancer ceUs. In addition, a unique KRP, centromere protein E, locaHzes to the kinetochore of human mitotic chromosomes and may play a role in their segregation to opposite spindle poles. Dynamin-related Motor Proteins Dynarnin is a large GTPase motor protein that functions as a "molecular purchase," generating a mechanochemical force used to sever membranes. This activity is important in forming clathrin- coated vesicles from coated pits in endocytosis and in the biogenesis of synaptic vesicles in neurons. Binding of dynarnin to a membrane leads to dynarnin' s self-assembly into spirals that may act to constrict a flat membrane surface into a tubule. GTP hydrolysis induces a change in conformation of the dynarnin polymer that pinches the membrane tubule, leading to severing of the membrane tubule and formation of a membrane vesicle. Release of GDP and inorganic phosphate leads to dynarnin disassembly. FoUowing disassembly the dynarnin may either dissociate from the membrane or remain associated to the vesicle and be transported to another region of the ceU. Three homologous dynarnin genes have been discovered, in addition to several dynamin-related proteins. Conserved dynarnin regions are the N-terminal GTP-binding domain, a central pleckstrin homology domain that binds membranes, a central coiled-coil region that may activate dynamin's GTPase activity, and a C- terminal proHne-rich domain that contains several motifs that bind SH3 domains on other proteins. Some dynamin-related proteins do not contain the pleckstrin homology domain or the proHne-rich domain. (See McNiven, M.A. (1998) CeU 94:151-154; Scaife, R.M. and RL. MargoHs (1997) CeU. Signal. 9:395-401.)
The cytoskeleton is reviewed in Lodish, H. et al. (1995) Molecular CeU Biology, Scientific American Books, New York NY.
Ribosomal Molecules Ribosomal RNAs (rRNAs) are assembled, along with ribosomal proteins, into ribosomes, which are cytoplasmic particles that translate messenger RNA into polypeptides. The eukaryotic ribosome is composed of a 60S (large) subunit and a 40S (smaU) subunit, which together form the 80S ribosome. In addition to the 18S, 28S, 5S, and 5.8S rRNAs, the ribosome also contains more than fifty proteins. The ribosomal proteins have a prefix which denotes the subunit to which they belong, either 5 L (large) or S (smaU). Ribosomal protein activities include binding rRNA and organizing the conformation of the junctions between rRNA heHces (Woodson, S.A. and N.B. Leontis (1998) Curr. Opin. Struct. Biol. 8:294-300; Ramakrishnan, V. and S.W. White (1998) Trends Biochem. Sci. 23:208- 212.) Three important sites are identified on the ribosome. The aminoacyl-tRNA site (A site) is where charged tRNAs (with the exception of the initiator-tRNA) bind on arrival at the ribosome. The o peptidyl-tRNA site (P site) is where new peptide bonds are formed, as weH as where the initiator tRNA binds. The exit site (E site) is where deacylated tRNAs bind prior to their release from the ribosome. (The ribosome is reviewed in Stryer, L. (1995) Biochemistry W.H. Freeman and Company, New York NY, pp. 888-908; and Lodish, H. et al. (1995) Molecular CeU Biology Scientific American Books, New York NY. pp. 119-138.) 5
Chromatin Molecules
The nuclear DNA of eukaryotes is organized into chromatin. Two types of chromatin are observed: euchromatin, some of which may be transcribed, and heterochromatin so densely packed that much of it is inaccessible to transcription. Chromatin packing thus serves to regulate protein 0 expression in eukaryotes. Bacteria lack chromatin and the chromatin-packing level of gene regulation.
The fundamental unit of chromatin is the nucleosome of 200 DNA base pairs associated with two copies each of histones H2A, H2B, H3, and H4. Adjascent nucleosomes are linked by another class of histones, HI. Low molecular weight non-histone proteins caUed the high mobiHty group (HMG), associated with chromatin, may function in the unwinding of DNA and stabiHzation of single- 5 stranded DNA. Chromodomain proteins function in compaction of chromatin into its transcriptionaUy silent heterochromatin form.
During mitosis, aU DNA is compacted into heterochromatin and transcription ceases. Transcription in interphase begins with the activation of a region of chromatin. Active chromatin is decondensed. Decondensation appears to be accompanied by changes in binding coefficient, o phosphorylation and acetylation states of chromatin histones. HMG proteins HMG13 and HMG17 selectively bind activated chromatin. Topoisomerases remove superheHcal tension on DNA. The activated region decondenses, aUowing gene regulatory proteins and transcription factors to assemble on the DNA.
Patterns of chromatin structure can be stably inherited, producing heritable patterns of gene 5 expression. In mammals, one of tihe two X chromosomes in each female ceU is inactivated by condensation to heterochromatin during zygote development. The inactive state of this chromosome is inherited, so that adult females are mosaics of clusters of paternal-X and maternal-X clonal ceU groups. The condensed X chromosome is reactivated in meiosis.
Chromatin is associated with disorders of protein expression such as thalassemia, a genetic anemia resulting from the removal of the locus control region (LCR) required for decondensation of the globin gene locus.
For a review of chromatin structure and function see Alberts, B. et al. (1994) Molecular CeU Biology, third edition, Garland PubHshing, Inc., New York NY, pp. 351-354, 433-439.
Electron Transfer Associated Molecules
Electron carriers such as cytochromes accept electrons from NADH or FADH2 and donate them to other electron carriers. Most electron-transferring proteins, except ubiquinone, are prosthetic groups such as flavins, heme, FeS clusters, and copper, bound to inner membrane proteins. Adrenodoxin, for example, is an FeS protein that forms a complex with NADPH:adrenodoxin reductase and cytochrome p450. Cytochromes contain a heme prosthetic group, a porphyrin ring containing a tightly bound iron atom. Electron transfer reactions play a crucial role in ceUular energy production.
Energy is produced by the oxidation of glucose and fatty acids. Glucose is initiaUy converted to pyruvate in the cytoplasm. Fatty acids and pyruvate are transported to the mitochondria for complete oxidation to C02 coupled by enzymes to the transport of electrons from NADH and FADH^ to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and P;.
Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydroHpoyl transacetylase, and dihydroHpoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccinylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase. Acetyl CoA is oxidized to C02 with concomitant formation of NADH, FADH^, and GTP. In oxidative phosphorylation, the transfer of electrons from NADH and FADHj to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and Pj by the F0F1 ATPase complex in the mitochondrial inner membrane. Enzyme complexes responsible for electron transport and ATP synthesis include the F^ ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c1; FeS protein, and cytochrome c oxidase.
ATP synthesis requires membrane transport enzymes including the phosphate transporter and the ATP- ADP antiport protein. The ATP-binding casette (ABC) superfamily has also been suggested as belonging to the mitochondrial transport group (Hogue, D.L. et al. (1999) J. Mol. Biol. 285:379-389). Brown fat uncoupling protein dissipates oxidative energy as heat, and may be involved the fever response to infection and trauma (Cannon, B. et al. (1998) Ann. NY Acad. Sci. 856:171- 187).
Mitochondria are oval-shaped organeUes comprising an outer membrane, a tightly folded inner membrane, an intermembrane space between the outer and inner membranes, and a matrix inside the inner membrane. The outer membrane contains many porin molecules that aUow ions and charged molecules to enter the intermembrane space, while the inner membrane contains a variety of transport proteins that transfer only selected molecules. Mitochondria are the primary sites of energy production in ceUs. Mitochondria contain a smaU amount of DNA. Human mitochondrial DNA encodes 13 proteins, 22 tRNAs, and 2 rRNAs. Mitochondrial-DNA encoded proteins include NADH-Q reductase, a cytochrome reductase subunit, cytochrome oxidase subunits, and ATP synthase subunits.
Electron-transfer reactions also occur outside the mitochondria in locations such as the endoplasmic reticulum, which plays a crucial role in Hpid and protein biosynthesis. Cytochrome b5 is a central electron donor for various reductive reactions occurring on the cytoplasmic surface of Hver endoplasmic reticulum. Cytochrome b5 has been found in Golgi, plasma, endoplasmic reticulum (ER), and microbody membranes.
For a review of mitochondrial metaboHsm and regulation, see Lodish, H. et al. (1995) Molecular CeU Biolo y, Scientific American Books, New York NY, pp. 745-797 and Stryer (1995) Biochemistry, W.H. Freeman and Co., San Francisco CA, pp 529-558, 988-989.
The majority of mitochondrial proteins are encoded by nuclear genes, are synthesized on cytosoHc ribosomes, and are imported into the mitochondria. Nuclear-encoded proteins which are destined for the mitochondrial matrix typically contain positively-charged amino terminal signal sequences. Import of these preproteins from the cytoplasm requires a multisubunit protein complex in the outer membrane known as the translocase of outer mitochondrial membrane (TOM; previously designated MOM; Pfanner, N. et al. (1996) Trends Biochem. Sci. 21:51-52) and at least three inner membrane proteins which comprise the translocase of inner mitochondrial membrane (TTM; previously designated MTM; Pfanner, supra). An inside-negative membrane potential across the inner mitochondrial membrane is also required for preprotein import. Preproteins are recognized by surface receptor components of the TOM complex and are translocated through a proteinaceous pore formed by other TOM components. Proteins targeted to the matrix are then recognized by the import machinery of the TTM complex. The import systems of the outer and inner membranes can function independently (Segui-Real, B. et al. (1993) EMBO J. 12:2211-2218).
Once precursor proteins are in the mitochondria, the leader peptide is cleaved by a signal peptidase to generate the mature protein. Most leader peptides are removed in a one step process by a protease termed mitochondrial processing peptidase (MPP) (Paces, V. et al. (1993) Proc. Natl.
Acad. Sci. USA 90:5355-5358). In some cases a two-step process occurs in which MPP generates an intermediate precursor form which is cleaved by a second enzyme, mitochondrial intermediate peptidase, to generate the mature protein. 5 Mitochondrial dysfunction leads to impaired calcium buffering, generation of free radicals that may participate in deleterious intraceUular and extraceUular processes, changes in mitochondrial permeability and oxidative damage which is observed in several neurodegenerative diseases. Neurodegenerative diseases linked to mitochondrial dysfunction include some forms of Alzheimer's disease, Friedreich's ataxia, famiHal amyotrophic lateral sclerosis, and Huntington's disease (Beal, 0 M.F. (1998) Biochim. Biophys. Acta 1366:211-213). The myocardium is heavily dependent on oxidative metaboHsm, so mitochondrial dysfunction often leads to heart disease (DiMauro, S. and M. Hirano (1998) Curr. Opin. Cardiol 13:190-197). Mitochondria are impHcated in disorders of ceU proHferation, since they play an important role in a ceH's decision to proHferate or self-destruct through apoptosis. The oncoprotein Bcl-2, for example, promotes ceU proHferation by stabiHzing mitochondrial 5 membranes so that apoptosis signals are not released (Susin, S.A. (1998) Biochim. Biophys. Acta 1366:151-165).
Transcription Factor Molecules
MulticeUular organisms are comprised of diverse ceU types that differ dramaticaUy both in o structure and function. The identity of a ceU is determined by its characteristic pattern of gene expression, and different ceU types express overlapping but distinctive sets of genes throughout development. Spatial and temporal regulation of gene expression is critical for the control of ceU proHferation, ceU differentiation, apoptosis, and other processes that contribute to organismal development. Furthermore, gene expression is regulated in response to extraceUular signals that 5 mediate ceU-ceU communication and coordinate the activities of different ceU types. Appropriate gene regulation also ensures that ceUs function efficiently by expressing only those genes whose functions are required at a given time.
Transcriptional regulatory proteins are essential for the control of gene expression. Some of these proteins function as transcription factors that initiate, activate, repress, or terminate gene o transcription. Transcription factors generaUy bind to the promoter, enhancer, and upstream regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory elements within or downstream of a gene's coding region. Transcription factors may bind to a specific region of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes IV, Oxford University Press, New York NY, and CeU Press, Cambridge MA, pp. 554-570.) 5 The double heHx structure and repeated sequences of DNA create topological and chemical features wnich can be recognize y transcription actors. ese ea ures are y rogen on onor and acceptor groups, hydrophobic patches, major and minor grooves, and regular, repeated stretches of sequence which induce distinct bends in the heHx. TypicaUy, transcription factors recognize specific DNA sequence motifs of about 20 nucleotides in length. Multiple, adjacent transcription 5 factor-binding motifs may be required for gene regulation.
Many transcription factors incorporate DNA-binding structural motifs which comprise either a heHces or β sheets that bind to the major groove of DNA. Four weH-characterized structural motifs are heHx-turn-heHx, zinc finger, leucine zipper, and heHx-loop-heHx. Proteins containing these motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA. 0 The heHx-turn-heHx motif consists of two α heHces connected at a fixed angle by a short chain of amino acids. One of the heHces binds to the major groove. HeHx-turn-heHx motifs are exempHfied by the homeobox motif which is present in homeodomain proteins. These proteins are critical for specifying the anterior-posterior body axis during development and are conserved throughout the animal kingdom. The Antennapedia and Ultrabithorax proteins of Drosophila 5 melanogaster are prototypical homeodomain proteins (Pabo, CO. and R.T. Sauer (1992) Annu. Rev. Biochem. 61:1053-1095).
The zinc finger motif, which binds zinc ions, generaUy contains tandem repeats of about 30 amino acids consisting of periodicaUy spaced cysteine and histidine residues. Examples of this sequence pattern, designated C2H2 and C3HC4 ("RING" finger), have been described (Lewin, o supra). Zinc finger proteins each contain an α heHx and an antiparaUel β sheet whose proximity and conformation are maintained by the zinc ion. Contact with DNA is made by the arginine prece ding the αheHx and by the second, third, and sixth residues of the heHx. Variants of the zinc finger motif include poorly defined cysteine-rich motifs which bind zinc or other metal ions. These motifs may not contain histidine residues and are generaUy nonrepetitive. 5 The leucine zipper motif comprises a stretch of amino acids rich in leucine which can form an amphipathic heHx. This structure provides the basis for dimerization of two leucine zipper proteins. The region adjacent to the leucine zipper is usuaUy basic, and upon protein dimerization, is optimaUy positioned for binding to the major groove. Proteins containing such motifs are generaUy referred to as bZIP transcription factors. o The heHx-loop-heHx motif (HLH) consists of a short heHx connected by a loop to a longer oc heHx. The loop is flexible and aHows the two heHces to fold back against each other and to bind to DNA. The transcription factor Myc contains a prototypical HLH motif.
Most transcription factors contain characteristic DNA binding motifs, and variations on the above motifs and new motifs have been and are currently being characterized (Faisst, S. and S. Meyer (1992) Nucleic Acids Res. 20:3-26).
Many neoplastic disorders in humans can be attributed to inappropriate gene expression. MaHgnant ceU growth may result from either excessive expression of tumor promoting genes or insufficient expression of tumor suppressor genes (Cleary, M.L. (1992) Cancer Surv. 15:89-104). 5 Chromosomal translocations may also produce chimeric loci which fuse the coding sequence of one gene with the regulatory regions of a second unrelated gene. Such an arrangement likely results in inappropriate gene transcription, potentiaUy contributing to maHgnancy.
In addition, the immune system responds to infection or trauma by activating a cascade of events that coordinate the progressive selection, ampHfication, and mobiHzation of ceUular defense o mechanisms. A complex and balanced program of gene activation and repression is involved in this process. However, hyperactivity of tihe immune system as a result of improper or insufficient regulation of gene expression may result in considerable tissue or organ damage. This damage is weU documented in immunological responses associated with arthritis, aUergens, heart attack, stroke, and infections (Isselbacher, KJ. et al. (1996) Harrison's Principles of Internal Medicine, 13/e, McGraw 5 HiU, Inc. and Teton Data Systems Software).
Furthermore, the generation of multiceUular organisms is based upon the induction and coordination of ceU differentiation at the appropriate stages of development. Central to this process is differential gene expression, which confers the distinct identities of ceUs and tissues throughout the body. Failure to regulate gene expression during development can result in developmental disorders. o Human developmental disorders caused by mutations in zinc finger-type transcriptional regulators include: urogenenital developmental abnormaHties associated with WTl; Greig cephalopolysyndactyly, PalHster-HaU syndrome, and postaxial polydactyly type A (GLI3); and Townes-Brocks syndrome, characterized by anal, renal, limb, and ear abnormaHties (SALL1) (Engelkamp, D. and V. van Heyningen (1996) Curr. Opin. Genet. Dev. 6:334-342; Kohlhase, J. et al. (1999) Am. J. Hum. Genet. 5 64:435-445).
Cell Membrane Molecules
Eukaryotic ceUs are surrounded by plasma membranes which enclose the ceU and maintain an environment inside the cell that is distinct from its surroundings. In addition, eukaryotic organisms are o distinct from prokaryotes in possessing many intraceUular organeUe and vesicle structures. Many of the metaboHc reactions which distinguish eukaryotic biochemistry from prokaryotic biochemistry take place within these structures. The plasma membrane and the membranes surrounding organeUes and vesicles are composed of phosphoglycerides, fatty acids, cholesterol, phosphoHpids, glycoHpids, proteoglycans, and proteins. These components confer identity and functionaHty to the membranes with which they associate.
Integral Membrane Proteins
The majority of known integral membrane proteins are transmembrane proteins (TM) which are characterized by an extraceUular, a transmembrane, and an intraceUular domain. TM domains are 5 typicaUy comprised of 15 to 25 hydrophobic amino acids which are predicted to adopt an α-heHcal conformation. TM proteins are classified as bitopic (Types I and H) and polytopic (Types HI and IV) (Singer, S.J. (1990) Annu. Rev. CeU Biol. 6:247-296). Bitopic proteins span the membrane once while polytopic proteins contain multiple membrane-spanning segments. TM proteins function as ceU- surface receptors, receptor-interacting proteins, transporters of ions or metaboHtes, ion channels, ceU 0 anchoring proteins, and ceU type-specific surface antigens.
Many membrane proteins (MPs) contain amino acid sequence motifs that target these proteins to specific subceUular sites. Examples of these motifs include PDZ domains, KDEL, RGD, NGR, and GSL sequence motifs, von WiUebrand factor A (vWFA) domains, and EGF-Hke domains. RGD, NGR, and GSL motif-containing peptides have been used as drug deHvery agents in targeted 5 cancer treatment of tamor vasculature (Arap, W. et al. (1998) Science 279:377-380). Furthermore, MPs may also contain amino acid sequence motifs, such as the carbohydrate recognition domain (CRD), that mediate interactions with extraceUular or intraceUular molecules. G-Protein Coupled Receptors
G-protein coupled receptors (GPCR) are a superfamily of integral membrane proteins which 0 transduce extraceUular signals. GPCRs include receptors for biogenic amines, Hpid mediators of inflammation, peptide hormones, and sensory signal mediators. The structure of these highly-conserved receptors consists of seven hydrophobic transmembrane regions, an extraceUular N-terminus, and a cytoplasmic C-terminus. Three extraceUular loops alternate with three intraceUular loops to link the seven transmembrane regions. Cysteine disulfide bridges connect the second and 5 third extraceUular loops. The most conserved regions of GPCRs are the transmembrane regions and the first two cytoplasmic loops. A conserved, acidic- Arg-aromatic residue triplet present in the second cytoplasmic loop may interact with G proteins. A GPCR consensus pattern is characteristic of most proteins belonging to this superfamily (ExPASy PROSITE document PS00237; and Watson, S. and S. ArkinstaU (1994) The G-protein Linked Receptor Facts Book, Academic Press, San Diego CA, o pp. 2-6). Mutations and changes in transcriptional activation of GPCR-encoding genes have been associated with neurological disorders such as schizophrenia, Parkinson's disease, Alzheimer's disease, drug addiction, and feeding disorders. Scavenger Receptors
Macrophage scavenger receptors with broad Hgand specificity may participate in the binding 5 of low density Hpoproteins (LDL) and foreign antigens. Scavenger receptors types I and H are trimeric membrane proteins with each subunit containing a smaU N-terminal intraceUular domain, a transmembrane domain, a large extraceUular domain, and a C-terminal cysteine-rich domain. The extraceUular domain contains a short spacer region, an α-heHcal coiled-coil region, and a triple heHcal coHagen-like region. These receptors have been shown to bind a spectrum of Hgands, including chemicaUy modified Hpoproteins and albumin, polyribonucleotides, polysaccharides, phosphoHpids, and asbestos (Matsumoto, A. et al. (1990) Proc. Natl. Acad. Sci. USA 87:9133-9137; and Elomaa, O. et al. (1995) CeU 80:603-609). The scavenger receptors are thought to play a key role in atherogenesis by mediating uptake of modified LDL in arterial waUs, and in host defense by binding bacterial endotoxins, bacteria, and protozoa. Tetraspan Family Proteins
The transmembrane 4 superfamily (TM4SF) or tetraspan family is a multigene family encoding type HI integral membrane proteins (Wright, M.D. and M.G. Tomlinson (1994) Immunol. Today 15:588-594). The TM4SF is comprised of membrane proteins which traverse the ceU membrane four times. Members of the TM4SF include platelet and endotheHal ceU membrane proteins, melanoma-associated antigens, leukocyte surface glycoproteins, colonal carcinoma antigens, tumor-associated antigens, and surface proteins of the schistosome parasites (Jankowski, S.A. (1994) Oncogene 9:1205-1211). Members of the TM4SF share about 25-30% amino acid sequence identity with one another.
A number of TM4SF members have been impHcated in signal transduction, control of ceU adhesion, regulation of ceU growth and proHferation, including development and oncogenesis, and ceU motiHty, including tumor ceU metastasis. Expression of TM4SF proteins is associated with a variety of tumors and the level of expression maybe altered when ceUs are growing or activated. Tumor Antigens
Tumor antigens are ceU surface molecules that are differentiaUy expressed in tumor ceUs relative to normal ceUs. Tumor antigens distinguish tamor ceUs immunologicaUy from normal ceUs and provide diagnostic and therapeutic targets for human cancers (Takagi, S. et al. (1995) Int. J. Cancer 61:706-715; Liu, E. et al. (1992) Oncogene 7:1027-1032). Leukocyte Antigens
Other types of ceU surface antigens include those identified on leukocytic ceUs of the immune system. These antigens have been identified using systematic, monoclonal antibody (mAb)-based
"shot gun" techniques. These techniques have resulted in the production of hundreds of mAbs directed against unknown ceU surface leukocytic antigens. These antigens have been grouped into "clusters of differentiation" based on common immunocytochemical locaHzation patterns in various differentiated and undifferentiated leukocytic ceU types. Antigens in a given cluster are presumed to identify a single ceU surface protein and are assigned a "cluster of differentiation" or "CD" esigna on. ome o e genes enco ng pro e ns en e y an gens ave een c one an verified by standard molecular biology techniques. CD antigens have been characterized as both transmembrane proteins and ceU surface proteins anchored to the plasma membrane via covalent attachment to fatty acid-containing glycoHpids such as glycosylphosphatidylinositol (GPI). (Reviewed in Barclay, A.N. et al. (1995) The Leucocyte Antigen Facts Book, Academic Press, San Diego CA, pp. 17-20.) Ion Channels
Ion channels are found in the plasma membranes of virtuaUy every cell in the body. For example, chloride channels mediate a variety of ceUular functions including regulation of membrane potentials and absorption and secretion of ions across epitheHal membranes. Chloride channels also regulate the pH of organeUes such as the Golgi apparatus and endosomes (see, e.g., Greger, R. (1988) Annu. Rev. Physiol. 50:111-122). Electrophysiological and pharmacological properties of chloride channels, including ion conductance, current- voltage relationships, and sensitivity to modulators, suggest that different chloride channels exist in muscles, neurons, fibroblasts, epitheHal ceUs, and lymphocytes.
Many ion channels have sites for phosphorylation by one or more protein kinases including protein kinase A, protein kinase C, tyrosine kinase, and casein kinase H, aU of which regulate ion channel activity in ceUs. Inappropriate phosphorylation of proteins in ceUs has been linked to changes in ceU cycle progression and ceU differentiation. Changes in the ceU cycle have been linked to induction of apoptosis or cancer. Changes in ceU differentiation have been linked to diseases and disorders of the reproductive system, immune system, skeletal muscle, and other organ systems. Proton Pumps
Proton ATPases comprise a large class of membrane proteins that use the energy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane. The resultant gradient may be used to transport other ions across the membrane (Na+, K+, or CI") or to maintain organeUe pH. Proton ATPases are further subdivided into the mitochondrial F- ATPases, the plasma membrane ATPases, and the vacuolar ATPases. The vacuolar ATPases estabHsh and maintain an acidic pH within various organeUes involved in the processes of endocytosis and exocytosis (MeUman, I. et al. (1986) Annu. Rev. Biochem. 55:663-700). Proton-coupled, 12 membrane-spanning domain transporters such as PEPT 1 and PEPT 2 are responsible for gastrointestinal absorption and for renal reabsorption of peptides using an electrochemical H+ gradient as the driving force. Another type of peptide transporter, the TAP transporter, is a heterodimer consisting of TAP 1 and TAP 2 and is associated with antigen processing. Peptide antigens are transported across the membrane of the endoplasmic reticulum by TAP so they can be expressed on the ceU surface in association with MHC molecules. Each TAP protem consists o mu tip e y rop o ic mem rane spann ng segments an a ig y conserve
ATP-binding cassette (BoU, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:284-289). Pathogenic microorganisms, such as herpes simplex virus, may encode inhibitors of TAP-mediated peptide transport in order to evade immune surveillance (Marusina, K. and JJ Manaco (1996) Curr. Opin. 5 Hematol. 3:19-26). ABC Transporters
The ATP-binding cassette (ABC) transporters, also caUed the "traffic ATPases", comprise a superfamily of membrane proteins that mediate transport and channel functions in prokaryotes and eukaryotes (Higgins, CF. (1992) Annu. Rev. CeU Biol. 8:67-113). ABC proteins share a similar o overall structure and significant sequence homology. AU ABC proteins contain a conserved domain of approximately two hundred amino acid residues which includes one or more nucleotide binding domains. Mutations in ABC transporter genes are associated with various disorders, such as hyperbiHrubinemia H/Dubin- Johnson syndrome, recessive Stargardt's disease, X-linked adrenoleukodystrophy, multidrug resistance, ceHac disease, and cystic fibrosis. 5 Peripheral and Anchored Membrane Proteins
Some membrane proteins are not membrane-spanning but are attached to the plasma membrane via membrane anchors or interactions with integral membrane proteins. Membrane anchors are covalently joined to a protein post-translationaUy and include such moieties as prenyl, myristyl, and glycosylphosphatidyl inositol groups. Membrane locaHzation of peripheral and anchored o proteins is important for their function in processes such as receptor-mediated signal transduction. For example, prenylation of Ras is required for its locaHzation to the plasma membrane and for its normal and oncogenic functions in signal transduction. Vesicle Coat Proteins
InterceUular communication is essential for the development and survival of multiceUular 5 organisms. CeUs communicate with one another through the secretion and uptake of protein signaling molecules. The uptake of proteins into the ceH is achieved by the endocytic pathway, in which the interaction of extraceUular signaling molecules with plasma membrane receptors results in the formation of plasma membrane-derived vesicles that enclose and transport the molecules into the cytosol. These transport vesicles fuse with and mature into endosomal and lysosomal (digestive) o compartments. The secretion of proteins from the ceU is achieved by exocytosis, in which molecules inside of the ceU proceed through the secretory pathway. In this pathway, molecules transit from the
ER to the Golgi apparatus and finaUy to the plasma membrane, where they are secreted from the ceU.
Several steps in the transit of material along the secretory and endocytic pathways require the formation of transport vesicles. SpecificaUy, vesicles form at the transitional endoplasmic reticulum 5 (tER), the rim of Golgi cisternae, the face of the Trans-Golgi Network (TGN), the plasma membrane (PM), and tubular extensions of the endosomes. Vesicle formation occurs when a region of membrane buds off from the donor organeUe. The membrane-bound vesicle contains proteins to be transported and is surrounded by a proteinaceous coat, the components of which are recruited from the cytosol. Two different classes of coat protein have been identified. Clathrin coats form on 5 vesicles derived from the TGN and PM, whereas coatomer (COP) coats form on vesicles derived from the ER and Golgi. COP coats can be further classified as COPI, involved in retrograde traffic through the Golgi and from the Golgi to the ER, and COPH, involved in anterograde traffic from the ER to the Golgi (MeUman, supra).
In clathrin-based vesicle formation, adapter proteins bring vesicle cargo and coat proteins
10 together at the surface of the budding membrane. Adapter protein- 1 and -2 select cargo from the
TGN and plasma membrane, respectively, based on molecular information encoded on the cytoplasmic tail of integral membrane cargo proteins. Adapter proteins also recruit clathrin to the bud site. Clathrin is a protein complex consisting of three large and three smaU polypeptide chains arranged in a three-legged structure caUed a triskeHon. Multiple triskeHons and other coat proteins appear to self-
15 assemble on the membrane to form a coated pit. This assembly process may serve to deform the membrane into a budding vesicle. GTP-bound ADP-ribosylation factor (Arf) is also incorporated into the coated assembly. Another smaU G-protein, dynarnin, forms a ring complex around the neck of the forming vesicle and may provide the mechanochemical force to seal the bud, thereby releasing the vesicle. The coated vesicle complex is then transported through the cytosol. During the transport
20 process, Arf-bound GTP is hydrolyzed to GDP, and the coat dissociates from the transport vesicle (West, M.A. et al. (1997) J. CeU Biol. 138:1239-1254).
Vesicles which bud from the ER and the Golgi are covered with a protein coat similar to the clathrin coat of endocytic and TGN vesicles. The coat protein (COP) is assembled from cytosoHc precursor molecules at specific budding regions on the organeUe. The COP coat consists of two
25 major components, a G-protein (Arf or Sar) and coat protomer (coatomer). Coatomer is an equimolar complex of seven proteins, termed alpha-, beta-, beta'-, gamma-, delta-, epsilon- and zeta-COP. The coatomer complex binds to dilysine motifs contained on the cytoplasmic tails of integral membrane proteins. These include the KKXX retrieval motif of membrane proteins of the ER and dibasic/diphenylamine motifs of members of the p24 family. The p24 family of type I membrane
"3 o proteins represent the major membrane proteins of COPI vesicles (Harter, C. and F.T. Wieland (1998) Proc. Natl. Acad. Sci. USA 95:11649-11654).
OrganeUe Associated Molecules
Eukaryotic ceUs are organized into various ceUular organeUes which has the effect of 35 separating specific molecules and their functions from one another and from the cytosol. Within the ceU, various membrane structures surround and define these organeUes while aUowing them to interact with one another and the ceU environment through both active and passive transport processes. Important ceH organeUes include the nucleus, the Golgi apparatus, the endoplasmic reticulum, mitochondria, peroxisomes, lysosomes, endosomes, and secretory vesicles. Nucleus
The ceU nucleus contains aH of the genetic information of the cell in the form of DNA, and the components and machinery necessary for repHcation of DNA and for transcription of DNA into RNA. (See Alberts, B. et al. (1994) Molecular Biology of the CeU, Garland PubHshing Inc., New York NY, pp. 335-399.) DNA is organized into compact structures in the nucleus by interactions with various DNA-binding proteins such as histones and non-histone chromosomal proteins.
DNA-specific nucleases, DNAses, partiaUy degrade these compacted structures prior to DNA repHcation or transcription. DNA repHcation takes place with the aid of DNA heHcases which unwind the double-stranded DNA heHx, and DNA polymerases that dupHcate the separated DNA strands.
Transcriptional regulatory proteins are essential for the control of gene expression. Some of these proteins function as transcription factors that initiate,, activate, repress, or terminate gene transcription. Transcription factors generaUy bind to the promoter, enhancer, and upstream regulatory regions of a gene in a sequence-specific manner, although some factors bind regulatory elements within or downstream of a gene's coding region. Transcription factors may bind to a specific region of DNA singly or as a complex with other accessory factors. (Reviewed in Lewin, B. (1990) Genes IN, Oxford University Press, New York NY, and CeU Press, Cambridge MA, pp. 554-570.) Many transcription factors incorporate DNA-binding structural motifs which comprise either α heHces or β sheets that bind to the major groove of DNA. Four weU-characterized structural motifs are heHx-turn- heHx, zinc finger, leucine zipper, and heHx-loop-heHx. Proteins containing these motifs may act alone as monomers, or they may form homo- or heterodimers that interact with DNA. Many neoplastic disorders in humans can be attributed to inappropriate gene expression.
MaHgnant ceU growth may result from either excessive expression of tumor promoting genes or insufficient expression of tumor suppressor genes (Cleary, M.L. (1992) Cancer Surv. 15:89-104). Chromosomal translocations may also produce chimeric loci which fuse the coding sequence of one gene with the regulatory regions of a second unrelated gene. Such an arrangement likely results in inappropriate gene transcription, potentiaUy contributing to maHgnancy.
In addition, the immune system responds to infection or trauma by activating a cascade of events that coordinate the progressive selection, ampHfication, and mobilization of ceUular defense mechanisms. A complex and balanced program of gene activation and repression is involved in this process. However, hyperactivity of the immune system as a result of improper or insufficient regulation of gene expression may result in considerable tissue or organ damage. This damage is weU documented in immunological responses associated with arthritis, aUergens, heart attack, stroke, and infections (Isselbacher, KJ. et al. (1996) Harrison's Principles of Internal Medicine, 13/e, McGraw HiU, Inc. and Teton Data Systems Software).
Transcription of DNA into RNA also takes place in the nucleus catalyzed by RNA 5 polymerases. Three types of RNA polymerase exist. RNA polymerase I makes large ribosomal RNAs, while RNA polymerase HI makes a variety of small, stable RNAs including 5S ribosomal RNA and the transfer RNAs (tRNA). RNA polymerase H transcribes genes that wiUbe translated into proteins. The primary transcript of RNA polymerase H is caUed heterogenous nuclear RNA (hnRNA), and must be further processed by spHcing to remove non-coding sequences caUed introns. o RNA spHcing is mediated by smaU nuclear ribonucleoprotein complexes, or snRNPs, producing mature messenger RNA (mRNA) which is then transported out of the nucleus for translation into proteins. Nucleolus
The nucleolus is a highly organized subcompartment in the nucleus that contains high concentrations of RNA and proteins and functions mainly in ribosomal RNA synthesis and assembly 5 (Alberts, et al. supra, pp. 379-382). Ribosomal RNA (rRNA) is a structural RNA that is complexed with proteins to form ribonucleoprotein structures caUed ribosomes. Ribosomes provide the platform on which protein synthesis takes place.
Ribosomes are assembled in the nucleolus initiaUy from a large, 45S rRNA combined with a variety of proteins imported from the cytoplasm, as weU as smaUer, 5S rRNAs. Later processing of 0 the immature ribosome results in formation of smaUer ribosomal subunits which are transported from the nucleolus to the cytoplasm where they are assembled into functional ribosomes. Endoplasmic Reticulum
In eukaryotes, proteins are synthesized within the endoplasmic reticulum (ER), deHvered from the ER to the Golgi apparatus for post-translational processing and sorting, and transported from the 5 Golgi to specific intraceUular and extraceUular destinations. Synthesis of integral membrane proteins, secreted proteins, and proteins destined for the lumen of a particular organeUe occurs on the rough endoplasmic reticulum (ER). The rough ER is so named because of the rough appearance in electron micrographs imparted by the attached ribosomes on which protein synthesis proceeds. Synthesis of proteins destined for the ER actuaHy begins in the cytosol with the synthesis of a specific signal o peptide which directs the growing polypeptide and its attached ribosome to the ER membrane where the signal peptide is removed and protein synthesis is completed. Soluble proteins destined for the ER lumen, for secretion, or for transport to the lumen of other organeUes pass completely into the ER lumen. Transmembrane proteins destined for the ER or for other ceU membranes are translocated across the ER membrane but remain anchored in the Hpid bilayer of the membrane by one or more 5 membrane-spanning α-heHcal regions. Translocated polypeptide chains destined for other organeUes or for secretion also fold and assemble in the ER lumen with the aid of certain "resident" ER proteins. Protein folding in the ER is aided by two principal types of protein isomerases, protein disulfide isomerase (PDI), and peptidyl- prolyl isomerase (PPI). PDI catalyzes the oxidation of free sulfhydryl groups in cysteine residues to 5 form intramolecular disulfide bonds in proteins. PPI, an enzyme that catalyzes the isomerization of certain proline imide bonds in oHgopeptides and proteins, is considered to govern one of the rate limiting steps in the folding of many proteins to their final functional conformation. The cyclophilins represent a major class of PPI that was originaUy identified as the major receptor for the immunosuppressive drug cyclosporin A (Handschumacher, R.E. et al. (1984) Science 226:544-547). o Molecular "chaperones" such as BiP (binding protein) in the ER recognize incorrectly folded proteins as weU as proteins not yet folded into their final form and bind to them, both to prevent improper aggregation between them, and to promote proper folding.
The "N-Hnked" glycosylation of most soluble secreted and membrane-bound proteins by oHgosacchrides linked to asparagine residues in proteins is also performed in the ER. This reaction is 5 catalyzed by a membrane-bound enzyme, oHgosaccharyl transferase. Golgi Apparatus
The Golgi apparatus is a complex structure that Hes adjacent to the ER in eukaryotic ceHs and serves primarily as a sorting and dispatching station for products of the ER (Alberts, et al. supra, pp. 600-610). Additional posttranslational processing, principaHy additional glycosylation, also occurs in o the Golgi. Indeed, the Golgi is a major site of carbohydrate synthesis, including most of the glycosaminoglycans of the extraceUular matrix. N-Hnked oHgosaccharides, added to proteins in the ER, are also further modified in the Golgi by the addition of more sugar residues to form complex N- linked oHgosaccharides. "O-Hnked" glycosylation of proteins also occurs in the Golgi by the addition of N-acetylgalactosamine to the hydroxyl group of a serine or threonine residue foUowed by the 5 sequential addition of other sugar residues to the first. This process is catalyzed by a series of glycosyltransferases each specific for a particular donor sugar nucleotide and acceptor molecule (Lodish, H. et al. (1995) Molecular CeU Biology, W.H. Freeman and Co., New York NY, ppJOO- 708). In many cases, both N- and O-Hnked oHgosaccharides appear to be required for the secretion of proteins or the movement of plasma membrane glycoproteins to the ceH surface. o The terminal compartment of the Golgi is the Trans-Golgi Network (TGN), where both membrane and lumenal proteins are sorted for their final destination. Transport (or secretory) vesicles destined for intraceUular compartments, such as lysosomes, bud off of the TGN. Other transport vesicles bud off containing proteins destined for the plasma membrane, such as receptors, adhesion molecules, and ion channels, and secretory proteins, such as hormones, neurotransmitters, and 5 digestive enzymes. Vacuoles
The vacuole system is a coUection of membrane bound compartments in eukaryotic ceUs that functions in the processes of endocytosis and exocytosis. They include phagosomes, lysosomes, endosomes, and secretory vesicles. Endocytosis is the process in ceUs of internaUzing nutrients, solutes or smaU particles (pinocytosis) or large particles such as internaHzed receptors, viruses, bacteria, or bacterial toxins (phagocytosis). Exocytosis is the process of transporting molecules to the ceU surface. It faciHtates placement or locaHzation of membrane-bound receptors or other membrane proteins and secretion of hormones, neurotransmitters, digestive enzymes, wastes, etc.
A common property of aU of these vacuoles is an acidic pH environment ranging from approximately pH 4.5-5.0. This acidity is maintained by the presence of a proton ATPase that uses the energy of ATP hydrolysis to generate an electrochemical proton gradient across a membrane (MeUman, I. et al. (1986) Annu. Rev. Biochem. 55:663-700). Eukaryotic vacuolar proton ATPase (vp-ATPase) is a multimeric enzyme composed of 3-10 different subunits. One of these subunits is a highly hydrophobic polypeptide of approximately 16 kDa that is similar to the proteoHpid component of vp-ATPases from eubacteria, fungi, and plant vacuoles (Mandel, M. et al. (1988) Proc. Natl. Acad. Sci. USA 85:5521-5524). The 16 kDa proteoHpid component is the major subunit of the membrane portion of vp-ATPase and functions in the transport of protons across the membrane. Lysosomes
Lysosomes are membranous vesicles containing various hydrolytic enzymes used for the controUed intraceUular digestion of macromolecules. Lysosomes contain some 40 types of enzymes including proteases, nucleases, glycosidases, Hpases, phosphoHpases, phosphatases, and sulfatases, aU of which are acid hydrolases that function at a pH of about 5. Lysosomes are surrounded by a unique membrane containing transport proteins that aUow the final products of macromolecule degradation, such as sugars, amino acids, and nucleotides, to be transported to the cytosol where they may be either excreted or reutiHzed by the ceU. A vp-ATPase, such as that described above, maintains the acidic environment necessary for hydrolytic activity (Alberts, supra, pp. 610-611). Endosomes
Endosomes are another type of acidic vacuole that is used to transport substances from the ceU surface to the interior of the ceU in the process of endocytosis. Like lysosomes, endosomes have an acidic environment provided by a vp-ATPase (Alberts et al. supra, pp. 610-618). Two types of endosomes are apparent based on tracer uptake studies that distinguish their time of formation in the ceU and their ceUular location. Early endosomes are found near the plasma membrane and appear to function primarily in the recycling of internaHzed receptors back to the ceH surface. Late endosomes appear later in the endocytic process close to the Golgi apparatus and the nucleus, and appear to be associated with deHvery of endocytosed material to lysosomes or to the TGN where they may be recyc e . pec c prote ns are assoc ate w t part cu ar transport ves c es an e r target compartments that may provide selectivity in targeting vesicles to their proper compartments. A cytosoHc prenylated GTP-binding protein, Rab, is one such protein. Rabs 4, 5, and 11 are associated with the early endosome, whereas Rabs 7 and 9 associate with the late endosome. Mitochondria
Mitochondria are oval-shaped organeUes comprising an outer membrane, a tightly folded inner membrane, an intermembrane space between the outer and inner membranes, and a matrix inside the inner membrane. The outer membrane contains many porin molecules that aUow ions and charged molecules to enter the intermembrane space, while the inner membrane contains a variety of transport proteins that transfer only selected molecules. Mitochondria are the primary sites of energy production in ceUs.
Energy is produced by the oxidation of glucose and fatty acids. Glucose is initiaUy converted to pyruvate in the cytoplasm. Fatty acids and pyruvate are transported to the mitochondria for complete oxidation to C02 coupled by enzymes to the transport of electrons from NADH and FADHj to oxygen and to the synthesis of ATP (oxidative phosphorylation) from ADP and P^
Pyruvate is transported into the mitochondria and converted to acetyl-CoA for oxidation via the citric acid cycle, involving pyruvate dehydrogenase components, dihydroHpoyl transacetylase, and dihydroHpoyl dehydrogenase. Enzymes involved in the citric acid cycle include: citrate synthetase, aconitases, isocitrate dehydrogenase, alpha-ketoglutarate dehydrogenase complex including transsuccinylases, succinyl CoA synthetase, succinate dehydrogenase, fumarases, and malate dehydrogenase. Acetyl CoA is oxidized to C02 with concomitant formation of NADH, FADH^, and GTP. In oxidative phosphorylation, the transfer of electrons from NADH and FADH2 to oxygen by dehydrogenases is coupled to the synthesis of ATP from ADP and Pj by the F^_ ATPase complex in the mitochondrial inner membrane. Enzyme complexes responsible for electron transport and ATP synthesis include the F f7 ! ATPase complex, ubiquinone(CoQ)-cytochrome c reductase, ubiquinone reductase, cytochrome b, cytochrome c FeS protein, and cytochrome c oxidase. Peroxisomes
Peroxisomes, like mitochondria, are a major site of oxygen utilization. They contain one or more enzymes, such as catalase and urate oxidase, that use molecular oxygen to remove hydrogen atoms from specific organic substrates in an oxidative reaction that produces hydrogen peroxide
(Alberts, supra, pp. 574-577). Catalase oxidizes a variety of substrates including phenols, formic acid, formaldehyde, and alcohol and is important in peroxisomes of Hver and kidney ceUs for detoxifying various toxic molecules that enter the bloodstream. Another major function of oxidative reactions in peroxisomes is the breakdown of fatty acids in a process caUed β oxidation, β oxidation results in shortening of the alkyl chain of fatty acids by blocks of two carbon atoms that are converted to acetyl
CoA and exported to the cytosol for reuse in biosynthetic reactions.
Also like mitochondria, peroxisomes import their proteins from the cytosol using a specific signal sequence located near the C-terminus of the protein. The importance of this import process is evident in the inherited human disease ZeHweger syndrome, in which a defect in importing proteins into perixosomes leads to a perixosomal deficiency resulting in severe abnormaHties in the brain, Hver, and kidneys, and death soon after birth. One form of this disease has been shown to be due to a mutation in the gene encoding a perixosomal integral membrane protein caUed peroxisome assembly factor- 1. The discovery of new human molecules satisfies a need in the art by providing new compositions which are useful in the diagnosis, study, prevention, and treatment of diseases associated with, as weU as effects of exogenous compounds on, the expression of human molecules.
SUMMARY OF THE INVENTION The present invention relates to nucleic acid sequences comprising human diagnostic and therapeutic polynucleotides (dithp) as presented in the Sequence Listing. The dithp uniquely identify genes encoding human structural, functional, and regulatory molecules.
The invention provides an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). Ju one alternative, the polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56. In another alternative, the polynucleotide comprises at least 30 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide comprising a polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). In another alternative, the polynucleotide comprises at least 60 contiguous nucleotides of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide comprising a polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). The invention further provides a composition for the detection of expression of human diagnostic and therapeutic polynucleotides comprising at least one isolated polynucleotide comprising a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d); and a detectable label.
The invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a polynucleotide sequence of a polyneucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of SEQ ID NO:l-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). The method comprises a) ampHfying said target polynucleotide or fragment thereof using polymerase chain reaction ampHfication, and b) detecting the presence or absence of said ampHfied target polynucleotide or fragment thereof, and, optionaUy, if present, the amount thereof.
The invention also provides a method for detecting a target polynucleotide in a sample, said target polynucleotide having a polynucleotide sequence of a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). The method comprises a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specificaUy hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide, and b) detecting the presence or absence of said hybridization complex, and, optionaUy, if present, the amount thereof. In one alternative, the invention provides a composition comprising a target polynucleotide of the method, wherein said probe comprises at least 30 contiguous nucleotides. In one alternative, the invention provides a composition comprising a target polynucleotide of the method, wherein said probe comprises at least 60 contiguous nucleotides.
The invention further provides a recombinant polynucleotide comprising a promoter sequence operably linked to an isolated polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; b) a 5 polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). In one alternative, the invention provides a ceU transformed with the recombinant polynucleotide. In another alternative, the invention provides a o transgenic organism comprising the recombinant polynucleotide.
The invention also provides a method for producing a human diagnostic and therapeutic polypeptide, the method comprising a) culturing a ceU under conditions suitable for expression of the human diagnostic and therapeutic polypeptide, wherein said ceU is transformed with a recombinant polynucleotide, said recombinant polynucleotide comprising an isolated polynucleotide selected from 5 the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; ii) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO.1-56; iii) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynucleotide of n); and v) an RNA equivalent of i) through iv), o and b) recovering the human diagnostic and therapeutic polypeptide so expressed. The invention additionaUy provides a method wherein the polypeptide has an amino acid sequence selected from the group consisting of SEQ ID NO:57-113.
The invention also provides an isolated human diagnostic and therapeutic polypeptide (DITHP) encoded by at least one polynucleotide comprising a polynucleotide sequence selected from 5 the group consisting of SEQ ID NO:l-56. The invention further provides a method of screening for a test compound that specificaUy binds to the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113. The method comprises a) combining the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113 with at least one test compound under suitable conditions, and b) detecting binding of the polypeptide having an o amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13 to the test compound, thereby identifying a compound that specificaUy binds to the polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13.
The invention further provides a microarray wherein at least one element of the microarray is an isolated polynucleotide comprising at least 30 contiguous nucleotides of a polynucleotide selected 5 from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from me group consistmg ol Sϋ ID JNU:l-5b; b) a polynucleotide comprismg a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). 5 The invention also provides a method for generating a transcript image of a sample which contains polynucleotides. The method comprises a) labeling the polynucleotides of the sample, b) contacting the elements of the microarray with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, and c) quantifying the expression of the polynucleotides in the sample. 0 AdditionaUy, the invention provides a method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide selected from the group consisting of a) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ JJD NO:l-56; b) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence 5 selected from the group consisting of SEQ ID NO:l-56; c) a polynucleotide complementary to the polynucleotide of a); d) a polynucleotide complementary to the polynucleotide of b); and e) an RNA equivalent of a) through d). The method comprises a) exposing a sample comprising the target polynucleotide to a compound, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying amounts of the o compound and in the absence of the compound.
The invention further provides a method for assessing toxicity of a test compound, said method comprising a) treating a biological sample containing nucleic acids with the test compound; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide selected from the group consisting of i) a polynucleotide 5 comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; ii) a polynucleotide comprising a nataraUy occuπing polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56; Hi) a polynucleotide complementary to the polynucleotide of i); iv) a polynucleotide complementary to the polynucleotide of n); and v) an RNA equivalent of i) through iv). Hybridization occurs under conditions whereby a o specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide selected from the group consisting of i) a polynucleotide comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56; H) a polynucleotide comprising a nataraUy occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from 5 the group consisting of SEQ ID NO: 1-56; iii) a polynucleotide complementary to the poljαiucleotide of i) ; iv) a polynucleotide complementary to the polynucleotide of ii) ; and v) an RNA equivalent of i) through iv), and alternatively, the target polynucleotide comprises a polynucleotide sequence of a fragment of a polynucleotide selected from the group consisting of i-v above; c) quantifying the amount of hybridization complex; and d) comparing the amount of hybridization complex in the treated biological sample with the amount of hybridization complex in an untreated biological sample, wherein a difference in the amount of hybridization complex in the treated biological sample is indicative of toxicity of the test compound.
The invention further provides an isolated polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13, b) a polypeptide comprising a nataraUy occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ J D NO:57-113, c) a biologicaHy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113. In one alternative, the invention provides an isolated polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113.
The invention further provides an isolated polynucleotide encoding a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a polypeptide comprising a nataraUy occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, c) a biologicaHy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13 , and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113. In one alternative, the polynucleotide encodes a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13. ha another alternative, the polynucleotide comprises a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56. AdditionaUy, the invention provides an isolated antibody which specificaUy binds to a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13, b) a polypeptide comprising a nataraUy occuπing amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13 , c) a biologicaHy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113. The invention further provides a composition comprising a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a polypeptide comprising a nataraUy occuπing amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, c) a biologicaUy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and a pharmaceuticaUy acceptable excipient. In one embodiment, the composition comprises a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113. The invention additionaUy provides a method of treating a disease or condition associated with decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition. The invention also provides a method for screening a compound for effectiveness as an agonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a polypeptide comprising a nataraUy occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, c) abiologicaUy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13. The method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting agonist activity in the sample. In one alternative, the invention provides a composition comprising an agonist compound identified by the method and a pharmaceuticaUy acceptable excipient. In another alternative, the invention provides a method of treating a disease or condition associated with decreased expression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
AdditionaUy, the invention provides a method for screening a compound for effectiveness as an antagonist of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a polypeptide comprising a nataraUy occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consistmg of SEQ ID NO:57-l 13, c) a biologicaUy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ED NO:57- 113, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-l 13. The method comprises a) exposing a sample comprising the polypeptide to a compound, and b) detecting antagonist activity in the sample. In one alternative, the invention provides a composition comprising an antagonist compound identified by the method and a pharmaceuticaUy acceptable excipient. In another alternative, the invention provides a method of treating a disease or condition associated with overexpression of functional DITHP, comprising administering to a patient in need of such treatment the composition.
The invention further provides a method of screening for a compound that modulates the activity of a polypeptide selected from the group consisting of a) a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a polypeptide comprising a nataraUy occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, c) a biologicaUy active fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and d) an immunogenic fragment of a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO:57-113. The method comprises a) combining the polypeptide with at least one test compound under conditions permissive for the activity of the polypeptide, b) assessing the activity of the polypeptide in the presence of the test compound, and c) comparing the activity of the polypeptide in the presence of the test compound with the activity of the polypeptide in the absence of the test compound, wherein a change in the activity of the polypeptide in the presence of the test compound is indicative of a compound that modulates the activity of the polypeptide.
DESCRIPTION OF THE TABLES
Table 1 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) conesponding to the polynucleotides of the present invention, along with the sequence identification numbers (SEQ ID NO:s) and open reading frame identification numbers (ORF IDs) corresponding to polypeptides encoded by the template ID.
Table 2 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) conesponding to the polynucleotides of the present invention, along with their GenBank hits (GI Numbers), probabiHty scores, and functional annotations conesponding to the GenBank hits. Table 3 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) conesponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated "start" and "stop" nucleotide positions. The reading frames of the polynucleotide segments and the Pfam hits, Pfam descriptions, and E-values conesponding to the polypeptide domains encoded by the polynucleotide segments are indicated.
Table 4 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) conesponding to the polynucleotides of the present invention, along with polynucleotide segments of each template sequence as defined by the indicated "start" and "stop" nucleotide positions. The reading frames of the polynucleotide segments are shown, and the polypeptides encoded by the polynucleotide segments constitute either signal peptide (SP) or transmembrane (TM) domains, as indicated. For TM domains, the membrane topology of the encoded polypeptide sequence is indicated as being transmembrane or on the cytosoHc or non-cytosoHc side of the ceU membrane or organeUe. Table 5 shows the sequence identification numbers (SEQ ID NO:s) and template identification numbers (template IDs) conesponding to the polynucleotides of the present invention, along with component sequence identification numbers (component IDs) conesponding to each template. The component sequences, which were used to assemble the template sequences, are defined by the indicated "start" and "stop" nucleotide positions along each template. Table 6 shows the tissue distribution profiles for the templates of the invention.
Table 7 shows the sequence identification numbers (SEQ ID NO:s) conesponding to the polypeptides of the present invention, along with the reading frames used to obtain the polypeptide segments, the lengths of the polypeptide segments, the "start" and "stop" nucleotide positions of the polynucleotide sequences used to define the encoded polypeptide segments, the GenBank hits (GI Numbers), probabiHty scores, and functional annotations corresponding to the GenBank hits. Table 8 summarizes the bioinformatics tools which are useful for analysis of the polynucleotides of the present invention. The first column of Table 8 Hsts analytical tools, programs, and algorithms, the second column provides brief descriptions thereof, the third column presents appropriate references, aU of which are incorporated by reference herein in their entirety, and the fourth column presents, where appHcable, the scores, probabiHty values, and other parameters used to evaluate the strength of a match between two sequences (the higher the score, the greater the homology between two sequences).
DETAILED DESCRIPTION OF THE INVENTION Before the nucleic acid sequences and methods are presented, it is to be understood that this invention is not limited to the particular machines, methods, and materials described. Although particular embodiments are described, machines, methods, and materials similar or equivalent to these embodiments may be used to practice the invention. The preferred machines, methods, and materials set forth are not intended to limit the scope of the invention which is limited only by the appended claims.
The singular forms "a", "an", and "the" include plural reference unless the context clearly dictates otherwise. AU technical and scientific terms have the meanings commonly understood by one of ordinary skiU in the art. AU pubHcations are incorporated by reference for the purpose of describing and disclosing the ceU lines, vectors, and methodologies which are presented and which might be used in connection with the invention. Nothing in the specification is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention.
Definitions As used herein, the lower case "dithp" refers to a nucleic acid sequence, while the upper case
"DITHP" refers to an amino acid sequence encoded by dithp. A "fuU-length" dithp refers to a nucleic acid sequence containing die entire coding region of a gene endogenously expressed in human tissue.
"Adjuvants" are materials such as Freund's adjuvant, mineral gels (aluminum hydroxide), and surface active substances (lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol) which may be administered to increase a host's immunological response.
"Allele" refers to an alternative form of a nucleic acid sequence. AUeles result from a "mutation," a change or an alternative reading of the genetic code. Any given gene may have none, one, or many aHeHc forms. Mutations which give rise to aUeles include deletions, additions, or substitutions of nucleotides. Each of these changes may occur alone, or in combination with the others, one or more times in a given nucleic acid sequence. The present invention encompasses aUeHc dithp.
An "aUeHc variant" is an alternative form of the gene encoding DITHP. AHeHc variants may result from at least one mutation in the nucleic acid sequence and may result in altered mRNAs or in polypeptides whose structure or function may or may not be altered. A gene may have none, one, or many aHeHc variants of its nataraUy occurring form. Common mutational changes which give rise to aUeHc variants are generaUy ascribed to natural deletions, additions, or substitutions of nucleotides. Each of these types of changes may occur alone, or in combination with the others, one or more times in a given sequence. "Altered" nucleic acid sequences encoding DITHP include those sequences with deletions, insertions, or substitutions of different nucleotides, resulting in a polypeptide the same as DITHP or a polypeptide with at least one functional characteristic of DITHP. Included within this definition are polymorphisms which may or may not be readily detectable using a particular oHgonucleotide probe of the polynucleotide encoding DITHP, and improper or unexpected hybridization to aHeHc variants, with a locus other than the normal chromosomal locus for the polynucleotide sequence encoding DITHP.
The encoded protein may also be "altered," and may contain deletions, insertions, or substitutions of amino acid residues which produce a silent change and result in a functionaUy equivalent DITHP. DeHberate amino acid substitutions maybe made on the basis of similarity in polarity, charge, solubiHty, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues, as long as the biological or immunological activity of DITHP is retained. For example, negatively charged amino acids may include aspartic acid and glutamic acid, and positively charged amino acids may include lysine and arginine. Amino acids with uncharged polar side chains having similar hydrophilicity values may include: asparagine and glutamine; and serine and threonine. Amino acids with uncharged side chains having similar hydrophiHcity values may include: leucine, isoleucine, and valine; glycine and alanine; and phenylalanine and tyrosine.
"Amino acid sequence" refers to a peptide, a polypeptide, or a protein of either natural or synthetic origin. The amino acid sequence is not limited to the complete, endogenous amino acid sequence and may be a fragment, epitope, variant, or derivative of a protein expressed by a nucleic acid sequence. "AmpHfication" refers to the production of additional copies of a sequence and is carried out using polymerase chain reaction (PCR) technologies weU known in the art.
"Antibody" refers to intact molecules as weU as to fragments thereof, such as Fab, F(ab')2, and Fv fragments, which are capable of binding the epitopic determinant. Antibodies that bind DITHP polypeptides can be prepared using intact polypeptides or using fragments containing smaU peptides of interest as the immunizing antigen. The polypeptide or peptide used to immunize an animal (e.g., a mouse, a rat, or a rabbit) can be derived from the translation of RNA, or synthesized chemicaUy, and can be conjugated to a carrier protein if desired. Commonly used carriers that are chemicaUy coupled to peptides include bovine serum albumin, thyroglobulin, and keyhole limpet hemocyanin (KLH). The coupled peptide is then used to immunize the animal. The term "aptamer" refers to a nucleic acid or oHgonucleotide molecule that binds to a specific molecular target. Aptamers are derived from an in vitro evolutionary process (e.g., SELEX (Systematic Evolution of Ligands by Exponential Enrichment), described in U.S. Patent No. 5,270,163), which selects for target-specific aptamer sequences from large combinatorial Hbraries. Aptamer compositions may be double-stranded or single-stranded, and may include deoxyribonucleotides, ribonucleotides, nucleotide derivatives, or other nucleotide-Hke molecules. The nucleotide components of an aptamer may have modified sugar groups (e.g., the 2'-OH group of a ribonucleotide may be replaced by 2'-F or 2 -NH2), which may improve a desired property, e.g., resistance to nucleases or longer Hfetime in blood. Aptamers may be conjugated to other molecules, e.g., a high molecular weight carrier to slow clearance of the aptamer from the circulatory system. Aptamers may be specificaUy cross-linked to their cognate Hgands, e.g., by photo-activation of a cross-linker. (See, e.g., Brody, E.N. and L. Gold (2000) J. Biotechnol. 74:5-13.)
The term "intramer" refers to an aptamer which is expressed in vivo. For example, a vaccinia virus-based RNA expression system has been used to express specific RNA aptamers at high levels in the cytoplasm of leukocytes (Blind, M. et al. (1999) Proc. Natl Acad. Sci. USA 96:3606-3610). The term "spiegelmer" refers to an aptamer which includes L-DNA, L-RNA, or other left- handed nucleotide derivatives or nucleotide-Hke molecules. Aptamers containing left-handed nucleotides are resistant to degradation by nataraUy occuning enzymes, which normaUy act on substrates containing right-handed nucleotides.
"Antisense sequence" refers to a sequence capable of specificaUy hybridizing to a target sequence. The antisense sequence may include DNA, RNA, or any nucleic acid mimic or analog such as peptide nucleic acid (PNA); oHgonucleotides having modified backbone linkages such as phosphorothioates, methylphosphonates, orbenzylphosphonates; oHgonucleotides having modified sugar groups such as 2'-methoxyethyl sugars or 2'-methoxyethoxy sugars; or oHgonucleotides having modified bases such as 5-methyl cytosine, 2'-deoxyuracil, or 7-deaza-2'-deoxyguanosine. "Antisense technology" refers to any technology which reHes on the specific hybridization of an antisense sequence to a target sequence.
A "bin" is a portion of computer memory space used by a computer program for storage of data, and bounded in such a manner that data stored in a bin may be retrieved by the program.
"BiologicaUy active" refers to an amino acid sequence having a structural, regulatory, or biochemical function of a nataraUy occuning amino acid sequence.
"Clone joining" is a process for combining gene bins based upon the bins' containing sequence information from the same clone. The sequences may assemble into a primary gene transcript as weU as one or more spHce variants.
"Complementary" describes the relationship between two single-stranded nucleic acid sequences that anneal by base-pairing (5 -A-G-T-3' pairs with its complement 3'-T-C-A-5').
A "component sequence" is a nucleic acid sequence selected by a computer program such as PHRED and used to assemble a consensus or template sequence from one or more component sequences.
A "consensus sequence" or "template sequence" is a nucleic acid sequence which has been assembled from overlapping sequences, using a computer program for fragment assembly such as the GELVIEW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a relational database management system (RDMS).
"Conservative amino acid substitutions" are those substitations that, when made, least interfere with the properties of the original protein, i.e., the structare and especiaUy the function of the protein is conserved and not significantly changed by such substitutions. The table below shows amino acids wliich may be substituted for an original amino acid in a protein and which are regarded as conservative substitutions. Original Residue Conservative Substitution
Ala Gly, Ser
Arg His, Lys
Asn Asp, Gin, His
Asp Asn, Glu
Cys Ala, Ser
Gin Asn, Glu, His
Glu Asp, Gin, His
Gly Ala
His Asn, Arg, Gin, Glu fie Leu, Val
Leu De, Val
Lys Arg, Gin, Glu
Met Leu, He
Phe His, Met, Leu, Trp, Tyr
Ser Cys, Thr
Thr Ser, Val
Trp Phe, Tyr
Tyr His, Phe, Trp
Val lie, Leu, Thr
Conservative substitations generaUy maintain (a) the structare of the polypeptide backbone in the area of the substitution, for example, as a beta sheet or alpha heHcal conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain.
"Deletion" refers to a change in either a nucleic or amino acid sequence in which at least one nucleotide or amino acid residue, respectively, is absent.
"Derivative" refers to the chemical modification of a nucleic acid sequence, such as by replacement of hydrogen by an alkyl, acyl, amino, hydroxyl, or other group. "Differential expression" refers to increased or upregulated; or decreased, downregulated, or absent gene or protein expression, determined by comparing at least two different samples. Such comparisons maybe carried out between, for example, a treated and an untreated sample, or a diseased and a normal sample.
The terms "element" and "anay element" refer to a polynucleotide, polypeptide, or other chemical compound having a unique and defined position on a microanay.
The term "modulate" refers to a change in the activity of DITHP. For example, modulation may cause an increase or a decrease in protein activity, binding characteristics, or any other biological, functional, or immunological properties of DITHP.
"E- value" refers to die statistical probabiHty that a match between two sequences occuned by chance. xon s u ing re ers o e recom ina on o i eren co ing reg ons exons . ince an exon may represent a structural or functional domain of the encoded protein, new proteins may be assembled through the novel reassortment of stable substructures, thus aUowing acceleration of the evolution of new protein functions. 5 A "fragment" is a unique portion of dithp or DITHP which is identical in sequence to but shorter in length than the parent sequence. A fragment may comprise up to the entire length of die defined sequence, minus one nucleotide/amino acid residue. For example, a fragment may comprise from 10 to 1000 contiguous amino acid residues or nucleotides. A fragment used as a probe, primer, antigen, therapeutic molecule, or for other purposes, maybe at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 60, o 75, 100, 150, 250 or at least 500 contiguous amino acid residues or nucleotides in length. Fragments maybe preferentiaUy selected from certain regions of a molecule. For example, a polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first 250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a certain defined sequence. Clearly these lengths are exemplary, and any length that is supported by the specification, including the 5 Sequence Listing and the figures, may be encompassed by the present embodiments.
A fragment of dithp comprises a region of unique polynucleotide sequence that specificaUy identifies dithp, for example, as distinct from any other sequence in the same genome. A fragment of dithp is useful, for example, in hybridization and ampHfication technologies and in analogous methods that distinguish dithp from related polynucleotide sequences. The precise length of a fragment of dithp o and the region of dithp to which the fragment corresponds are routinely deter inable by one of ordinary skiU in the art based on the intended purpose for the fragment.
A fragment of DITHP is encoded by a fragment of diflip. A fragment of DITHP comprises a region of unique amino acid sequence that specificaUy identifies DITHP. For example, a fragment of DITHP is useful as an immunogenic peptide for the development of antibodies that specificaUy 5 recognize DITHP. The precise length of a fragment of DITHP and the region of DITHP to which the fragment corresponds are routinely determinable by one of ordinary skiU in the art based on the intended purpose for the fragment.
A "fuU length" nucleotide sequence is one containing at least a start site for translation to a protein sequence, foUowed by an open reading frame and a stop site, and encoding a "full length" 0 polypeptide.
"Hit" refers to a sequence whose annotation wiUbe used to describe a given template. Criteria for selecting the top hit are as foUows: if the template has one or more exact nucleic acid matches, the top hit is the exact match with highest percent identity. If the template has no exact matches but has significant protein hits, the top hit is the protein hit with the lowest E- value. H the template has no significant protein hits, but does have significant non-exact nucleotide hits, the top hit is the nucleotide hit with the lowest E- value.
"Homology" refers to sequence similarity either between a reference nucleic acid sequence and at least a fragment of a dithp or between a reference amino acid sequence and a fragment of a 5 DITHP.
"Hybridization" refers to the process by which a strand of nucleotides anneals with a complementary strand through base pairing. Specific hybridization is an indication that two nucleic acid sequences share a high degree of identity. Specific hybridization complexes form under defined annealing conditions, and remain hybridized after the "washing" step. The defined hybridization 0 conditions include the annealing conditions and the washing step(s), the latter of which is particularly important in determining the stringency of the hybridization process, with more stringent conditions aHowing less non-specific binding, i.e., binding between pairs of nucleic acid probes that are not perfectly matched. Permissive conditions for anneaHng of nucleic acid sequences are routinely determinable and may be consistent among hybridization experiments, whereas wash conditions may 5 be varied among experiments to achieve the desired stringency.
GeneraUy, stringency of hybridization is expressed with reference to the temperature under which the wash step is carried out. GeneraUy, such wash temperatures are selected to be about 5°C to 20°C lower than the thermal melting point (Tj for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target o sequence hybridizes to a perfectly matched probe. An equation for calculating Tra and conditions for nucleic acid hybridization is weU known and can be found in Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; specificaUy see volume 2, chapter 9.
High stringency conditions for hybridization between polynucleotides of the present invention 5 include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0.1% SDS, for 1 hour. Alternatively, temperatures of about 65 °C, 60°C, or 55°C may be used. SSC concentration may be varied from about 0.2 to 2 x SSC, with SDS being present at about 0.1%. TypicaHy, blocking reagents are used to block non-specific hybridization. Such blocking reagents include, for instance, denatured salmon sperm DNA at about 100-200 μg/ml. Useful variations on these conditions will be readily o apparent to those skiUed in the art. Hybridization, particularly under high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. Such similarity is strongly indicative of a similar role for the nucleotides and their resultant proteins.
Other parameters, such as temperature, salt concentration, and detergent concentration may be varied to achieve the desired stringency. Denaturants, such as formamide at a concentration of 5 about 35-50% v/v, may also be used under particular circumstances, such as RNA:DNA hybridizations. Appropriate hybridization conditions are routinely determinable by one of ordinary skiU in the art.
"ImmunologicaUy active" or "immunogenic'' describes the potential for a natural, recombinant, or synthetic peptide, epitope, polypeptide, or protein to induce antibody production in appropriate animals, ceUs, or ceU Hues.
"Immune response" can refer to conditions associated with inflammation, trauma, immune disorders, or infectious or genetic disease, etc. These conditions can be characterized by expression of various factors, e.g., cytokines, chemokines, and other signaling molecules, which may affect ceUular and systemic defense systems. An "immunogenic fragment" is a polypeptide or oHgopeptide fragment of Dithp which is capable of eliciting an immune response when introduced into a Hving organism, for example, a mammal. The term "immunogenic fragment" also includes any polypeptide or oHgopeptide fragment of DITHP which is useful in any of the antibody production methods disclosed herein or known in the art. "Insertion" or "addition" refers to a change in either a nucleic or amino acid sequence in wliich at least one nucleotide or residue, respectively, is added to the sequence.
"Labeling" refers to the covalent or nonco valent joining of a polynucleotide, polypeptide, or antibody with a reporter molecule capable of producing a detectable or measurable signal.
"Microanay" is any anangement of nucleic acids, amino acids, antibodies, etc., on a substrate. The substrate may be a soHd support such as beads, glass, paper, nitroceUulose, nylon, or an appropriate membrane.
"Linkers" are short stretches of nucleotide sequence which may be added to a vector or a dithp to create restriction endonuclease sites to faciHtate cloning. "Polylinkers" are engineered to incorporate multiple restriction enzyme sites and to provide for the use of enzymes which leave 5' or 3' overhangs (e.g., BamHI, EcoRI, and HindlH) and those which provide blunt ends (e.g., EcoRV, SnaBI, and Stul).
"NataraUy occurring" refers to an endogenous polynucleotide or polypeptide that maybe isolated from viruses or prokaryotic or eukaryotic ceUs.
"Nucleic acid sequence" refers to the specific order of nucleotides joined by phosphodiester bonds in a linear, polymeric anangement. Depending on the number of nucleotides, the nucleic acid sequence can be considered an oHgomer, oHgonucleotide, or polynucleotide. The nucleic acid can be DNA, RNA, or any nucleic acid analog, such as PNA, maybe of genomic or synthetic origin, maybe either double-stranded or single-stranded, and can represent either the sense or antisense (complementary) strand. gomer re ers to a nuc e c ac sequence o at east a out nuc eo es an as many as about 60 nucleotides, preferably about 15 to 40 nucleotides, and most preferably between about 20 and
30 nucleotides, that may be used in hybridization or ampHfication technologies. OHgomers may be used as, e.g., primers for PCR, and are usuaHy chemicaUy synthesized. 5 "Operably linked" refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. GeneraUy, operably linked DNA sequences may be in close proximity or contiguous and, where necessary to join two protein coding regions, in the same reading frame. 0 "Peptide nucleic acid" (PNA) refers to a DNA mimic in which nucleotide bases are attached to a pseudopepti.de backbone to increase stability. PNAs, also designated antigene agents, can prevent gene expression by targeting complementary messenger RNA.
The phrases "percent identity" and "% identity", as appHed to polynucleotide sequences, refer to the percentage of residue matches between at least two polynucleotide sequences aHgned using a 5 standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize aHgnment between two sequences, and therefore achieve a more meaningful comparison of the two sequences.
Percent identity between polynucleotide sequences maybe determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e o sequence aHgnment program. This program is part of the LASERGENE software package, a suite of molecular biological analysis programs (DNASTAR, Madison WT). CLUSTAL V is described in Higgins, D.G. and Sharp, P.M. (1989) CABIOS 5:151-153 and in Higgins, D.G. et al. (1992) CABIOS 8:189-191. For pairwise aHgnments of polynucleotide sequences, the default parameters are set as foUows: Ktaple=2, gap penalty=5, window=4, and "diagonals saved"=4. The "weighted" residue 5 weight table is selected as the default. Percent identity is reported by CLUSTAL V as the "percent similarity" between aHgned polynucleotide sequence pairs.
Alternatively, a suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local AHgnment Search Tool (BLAST) (Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410), which is available from o several sources, including the NCBI, Bethesda, MD, and on the Internet at http://www.ncbi.nlm.nih.gov/BLAST/. The BLAST software suite includes various sequence analysis programs including "blastn," that is used to determine aHgnment between a known polynucleotide sequence and other sequences on a variety of databases. Also available is a tool caUed "BLAST 2 Sequences" that is used for direct pairwise comparison of two nucleotide sequences. "BLAST 2 5 Sequences" can be accessed and used interactively at http://www.ncbi.nlm.nih.gov/gorf/bl2/. The equences too can e use or ot astn an astp scusse e ow . programs are commonly used with gap and other parameters set to default settings. For example, to compare two nucleotide sequences, one may use blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) set at default parameters. Such default parameters maybe, for example: 5 Matrix: BLOSUM62
Reward for match: 1
Penalty for mismatch: -2
Open Gap: 5 and Extension Gap: 2 penalties
Gap x drop-off: 50 o Expect: 10
Word Size: 11
Filter: on
Percent identity may be measured over the length of an entire defined sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, 5 over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a length over which percentage identity may be measured. 0 Nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that aU encode substantiaUy the same protein.
The phrases "percent identity" and "% identity", as appHed to polypeptide sequences, refer to 5 the percentage of residue matches between at least two polypeptide sequences aHgned using a standardized algorithm. Methods of polypeptide sequence aHgnment are weU-known. Some aHgnment methods take into account conservative amino acid substitations. Such conservative substitations, explained in more detail above, generaUy preserve the hydrophobicity and acidity of the substituted residue, thus preserving the structure (and therefore function) of the folded polypeptide. o Percent identity between polypeptide sequences may be determined using the default parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e sequence aHgnment program (described and referenced above). For pairwise aHgnments of polypeptide sequences using CLUSTAL V, the default parameters are set as foUows: Ktaple=l, gap penalty=3, window=5, and "diagonals saved"=5. The PAM250 matrix is selected as the default resi ue weig a e. s wi po ynuc eo e a gnmen s, e percen i en y s repor e y
CLUSTAL V as the "percent similarity" between aHgned polypeptide sequence pairs.
Alternatively the NCBI BLAST software suite may be used. For example, for a pairwise comparison of two polypeptide sequences, one may use the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) with blastp set at default parameters. Such default parameters may be, for example:
Matrix: BLOSUM62
Open Gap: 11 and Extension Gap: 1 penalty
Gap x drop-off: 50
Expect: 10 Word Size: 3
Filter: on
Percent identity may be measured over the length of an entire defined polypeptide sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in figures or Sequence Listings, maybe used to describe a length over which percentage identity may be measured.
"Post-translational modification" of a DITHP may involve Hpidation, glycosylation, phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in the art. These processes may occur syntheticaUy or biochemicaUy. Biochemical modifications wiU vary by ceU type depending on the enzymatic miHeu and the DITHP.
"Probe" refers to dithp or fragments thereof, which are used to detect identical, aUeHc or related nucleic acid sequences. Probes are isolated oHgonucleotides or polynucleotides attached to a detectable label or reporter molecule. Typical labels include radioactive isotopes, Hgands, chemiluminescent agents, and enzymes. "Primers" are short nucleic acids, usuaUy DNA oHgonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. The primer may then be extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for ampHfication (and identification) of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR).
Probes and primers as used in the present invention typicaUy comprise at least 15 contiguous nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also be employed, such as probes and primers that comprise at least 20, 30, 40, 50, 60, 70, 80, 90, 100, or at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers maybe considerably longer than these examples, an it is un erstoo t at any eng supporte y e specification, including the figures and Sequence Listing, may be used.
Methods for preparing and using probes and primers are described in the references, for example Sambrook et al, 1989, Molecular Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; Ausubel et al., 1987, Current Protocols in Molecular Biology, Greene Publ. Assoc. & Wiley-Intersciences, New York NY; Innis et al., 1990, PCR Protocols, A Guide to Methods and AppHcations, Academic Press, San Diego CA. PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA). OHgonucleotides for use as primers are selected using software known in the art for such purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 100 nucleotides each, and for the analysis of oHgonucleotides and larger polynucleotides of up to 5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer selection programs have incorporated additional features for expanded capabiHties. For example, the PrimOU primer selection program (available to the pubHc from the Genome Center at University of Texas South West Medical Center, DaUas TX) is capable of choosing specific primers from megabase sequences and is thus useful for designing primers on a genome- wide scope. The Primer3 primer selection program (available to the pubHc from the Whitehead Institute/MIT Center for Genome Research, Cambridge MA) aUows the user to input a "mispriming Hbrary," in which sequences to avoid as primer binding sites are user-specified. Primer3 is useful, in particular, for the selection of oHgonucleotides for microarrays. (The source code for the latter two primer selection programs may also be obtained from their respective sources and modified to meet the user's specific needs.) The PrimeGen program (available to the pubHc from the UK Human Genome Mapping Project Resource Centre, Cambridge UK) designs primers based on multiple sequence aHgnments, thereby aUowing selection of primers that hybridize to either the most conserved or least conserved regions of aHgned nucleic acid sequences. Hence, this program is useful for identification of both unique and conserved oHgonucleotides and polynucleotide fragments. The oHgonucleotides and polynucleotide fragments identified by any of the above selection methods are useful in hybridization technologies, for example, as PCR or sequencing primers, microanay elements, or specific probes to identify fuUy or partiaUy complementary polynucleotides in a sample of nucleic acids. Methods of oHgonucleotide selection are not limited to those described above.
"Purified" refers to molecules, either polynucleotides or polypeptides that are isolated or separated from their natural environment and are at least 60% free, preferably at least 75% free, and most preferably at least 90% free from other compounds with which they are nataraUy associated. A "recombinant nucleic acid" is a sequence that is not nataraUy occurring or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence.
This artificial combination is often accompHshed by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques such as those described in Sambrook, supra. The term recombinant includes nucleic acids that have been altered solely by addition, substitution, or deletion of a portion of the nucleic acid. Frequently, a recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter sequence.
Such a recombinant nucleic acid may be part of a vector that is used, for example, to transform a ceH. Alternatively, such recombinant nucleic acids maybe part of a viral vector, e.g., based on a vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is expressed, inducing a protective immunological response in the mammal.
"Regulatory element" refers to a nucleic acid sequence from nontranslated regions of a gene, and includes enhancers, promoters, introns, and 3 ' untranslated regions, which interact with host proteins to cany out or regulate transcription or translation. "Reporter" molecules are chemical or biochemical moieties used for labeling a nucleic acid, an amino acid, or an antibody. They include radionucHdes; enzymes; fluorescent, chemiluminescent, or chromogenic agents; substrates; cofactors; inhibitors; magnetic particles; and other moieties known in the art.
An "RNA equivalent," in reference to a DNA sequence, is composed of the same linear sequence of nucleotides as the reference DNA sequence with the exception that aU occunences of the nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose instead of deoxyribose.
"Sample" is used in its broadest sense. Samples may contain nucleic or amino acids, antibodies, or other materials, and maybe derived from any source (e.g., bodily fluids including, but not Hmited to, saHva, blood, and urine; chromosome(s), organeUes, or membranes isolated from a ceH; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared ceUs or tissues or blots or imprints from such ceUs or tissues).
"Specific binding" or "specificaUy binding" refers to the interaction between a protein or peptide and its agonist, antibody, antagonist, or other binding partner. The interaction is dependent upon the presence of a particular structure of the protein, e.g., the antigenic determinant or epitope, recognized by the binding molecule. For example, if an antibody is specific for epitope "A," the presence of a polypeptide containing epitope A, or the presence of free unlabeled A, in a reaction containing free labeled A and the antibody wiU reduce the amount of labeled A that binds to the antibody. "Substitution" refers to the replacement of at least one nucleotide or amino acid by a different nucleotide or amino acid.
"Substrate" refers to any suitable rigid or semi-rigid support including, e.g., membranes, filters, chips, sHdes, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, 5 microparticles or capiHaries. The substrate can have a variety of surface forms, such as weHs, trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound.
A "transcript image" or "expression profile" refers to the coUective pattern of gene expression by a particular ceU type or tissue under given conditions at a given time.
"Transformation" refers to a process by which exogenous DNA enters a recipient ceU. 0 Transformation may occur under natural or artificial conditions using various methods weU known in the art. Transformation may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host ceU. The method is selected based on the host ceH being transformed.
"Transformants" include stably transformed ceHs in which the inserted DNA is capable of 5 repHcation either as an autonomously repHcating plasmid or as part of the host chromosome, as weU as ceUs which transiently express inserted DNA or RNA.
A "transgenic organism," as used herein, is any organism, including but not Hmited to animals and plants, in which one or more of the ceUs of the organism contains heterologous nucleic acid introduced by way of human intervention, such as by transgenic techniques weU known in die art. The o nucleic acid is introduced into the ceU, directly or indirectly by introduction into a precursor of the ceU, by way of deHberate genetic manipulation, such as by microinjection or by infection with a recombinant virus. The term genetic manipulation does not include classical cross-breeding, or in vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule. The transgenic organisms contemplated in accordance with the present invention include bacteria, cyanobacteria, 5 fungi, and plants and animals. The isolated DNA of the present invention can be introduced into the host by methods known in the art, for example infection, transfection, transformation or transconjugation. Techniques for transferring the DNA of the present invention into such organisms are widely known and provided in references such as Sambrook et al. (1989), supra.
A "variant" of a particular nucleic acid sequence is defined as a nucleic acid sequence having 0 at least 25% sequence identity to tihe particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters. Such a pair of nucleic acids may show, for example, at least 30%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater 5 sequence identity over a certain defined length. The variant may result in "conservative" amino acid changes which do not affect structural and/or chemical properties. A variant may be described as, for example, an "aUeHc" (as defined above), "spHce," "species," or "polymorphic" variant. A spHce variant may have significant identity to a reference molecule, but wiU generaUy have a greater or lesser number of polynucleotides due to alternate spHcing of exons during mRNA processing. The corresponding polypeptide may possess additional functional domains or lack domains that are present in the reference molecule. Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides generaUy wiUhave significant amino acid identity relative to each other. A polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species. Polymorphic variants also may encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies by one base. The presence of SNPs maybe indicative of, for example, a certain population, a disease state, or a propensity for a disease state.
In an alternative, variants of the polynucleotides of the present invention maybe generated through recombinant methods. One possible method is a DNA shuffling technique such as MOLECULARBREEDING (Maxygen Inc., Santa Clara CA; described in U.S. Patent Number
5,837,458; Chang, C-C. et al. (1999) Nat. Biotechnol. 17:793-797; Christians, F.C et al. (1999) Nat. Biotechnol. 17:259-264; and Crameri, A. et al. (1996) Nat. Biotechnol. 14:315-319) to alter or improve the biological properties of DITHP, such as its biological or enzymatic activity or its abiHty to bind to other molecules or compounds. DNA shuffling is a process by which a Hbrary of gene variants is produced using PCR-mediated recombination of gene fragments. The Hbrary is then subjected to selection or screening procedures that identify those gene variants with the desired properties. These preferred variants may then be pooled and further subjected to recursive rounds of DNA shuffling and selection/screening. Thus, genetic diversity is created through "artificial" breeding and rapid molecular evolution. For example, fragments of a single gene containing random point mutations may be recombined, screened, and then reshuffled until the desired properties are optimized. Alternatively, fragments of a given gene maybe recombined with fragments of homologous genes in the same gene family, either from the same or different species, thereby maximizing the genetic diversity of multiple nataraUy occurring genes in a directed and controUable manner.
A "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 1999) set at default parameters. Such a pair of polypeptides may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater identity over a certain defined length of one of the polypeptides. E
In a particular embodiment, cDNA sequences derived from human tissues and ceU Hnes were aHgned based on nucleotide sequence identity and assembled into "consensus" or "template" sequences which are designated by the template identification numbers (template IDs) in column 2 of Table 2. The sequence identification numbers (SEQ ID NO:s) conespondmg to the template IDs are shown in column 1. The template sequences have similarity to GenBank sequences, or "hits," as designated by the GI Numbers in column 3. The statistical probabiHty of each GenBank bit is indicated by a probabiHty score in column 4, and the functional annotation conesponding to each
GenBank hit is Hsted in column 5. The invention incorporates the nucleic acid sequences of these templates as disclosed in the
Sequence Listing and the use of these sequences in the diagnosis and treatment of disease states characterized by defects in human molecules. The invention further utilizes these sequences in hybridization and ampHfication technologies, and in particular, in technologies which assess gene expression patterns conelated with specific ceUs or tissues and their responses in vivo or in vitro to pharmaceutical agents, toxins, and other treatments. In this manner, the sequences of the present invention are used to develop a transcript image for a particular ceU or tissue.
Derivation of Nucleic Acid Sequences cDNA was isolated from Hbraries constructed using RNA derived from normal and diseased human tissues and ceH Hnes. The human tissues and ceU Hnes used for cDNA Hbrary construction were selected from a broad range of sources to provide a diverse population of cDNAs representative of gene transcription throughout the human body. Descriptions of the human tissues and ceU Hnes used for cDNA Hbrary construction are provided in the LIFESEQ database (Incyte Genomics, Inc. (Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, reproductive, and urologic sources.
CeU Hnes used for cDNA Hbrary construction were derived from, for example, leukemic ceUs, teratocarcinomas, neuroepitheHomas, cervical carcinoma, lung fibroblasts, and endotheHal ceUs. Such ceU Hnes include, for example, THP-1, Jurkat, HUVEC, hNT2, WI38, HeLa, and other ceU Hnes commonly used and available from pubHc depositories (American Type Culture CoUection, Manassas VA). Prior to mRNA isolation, ceU Hnes were untreated, treated with a pharmaceutical agent such as 5'-aza-2'-deoxycyύdine, treated with an activating agent such as Hpopolysaccharide in the case of leukocytic ceH Hnes, or, in the case of endotheHal ceU Hnes, subjected to shear stress. equencing of t e c s
Methods for DNA sequencing are weU known in the art. Conventional enzymatic methods employ the Klenow fragment of DNA polymerase I, SEQUENASE DNA polymerase (U.S. Biochemical Corporation, Cleveland OH), Taq polymerase (AppHed Biosystems, Foster City CA), thermostable T7 polymerase (Amersham Pharmacia Biotech, Inc. (Amersham Pharmacia Biotech), Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found in the ELONGASE ampHfication system (Life Technologies Inc. (Life Technologies), Gaithersburg MD), to extend the nucleic acid sequence from an oHgonucleotide primer annealed to the DNA template of interest. Methods have been developed for the use of both single-stranded and double- stranded templates. Chain termination reaction products may be electrophoresed on urea- polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or by fluorescence (for fluorophore-labeled nucleotides). Automated methods for mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have been developed. Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 Hquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal cycler (AppHed Biosystems). Sequencing can be carried out using, for example, the ABI 373 or 377 (AppHed Biosystems) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA sequencing systems, or other automated and manual sequencing systems weU known in the art.
The nucleotide sequences of the Sequence Listing have been prepared by cunent, state-of- the-art, automated methods and, as such, may contain occasional sequencing enors or unidentified nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified bases do not represent a hindrance to practicing the invention for those skiUed in the art. Several methods employing standard recombinant techniques may be used to conect enors and complete the missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. (1997) Short Protocols in Molecular Biology, John Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Plainview NY.)
Assembly of cDNA Sequences
Human polynucleotide sequences maybe assembled using programs or algorithms weU known in the art. Sequences to be assembled are related, whoUy or in part, and may be derived from a single or many different transcripts. Assembly of the sequences can be performed using such programs as PHRAP (Phils Revised Assembly Program) and the GELVJEW fragment assembly system (GCG), or other methods known in the art. Alternatively, cDNA sequences are used as "component" sequences that are assembled into
"template" or "consensus" sequences as foUows. Sequence chromatograms are processed, verified, and quaHty scores are obtained using PHRED. Raw sequences are edited using an editing pathway known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA). A series of BLAST comparisons is performed and low-information segments and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n's", or masked, to prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed. The processed sequences are then loaded into a relational database management system (RDMS) which assigns edited sequences to existing templates, if available. When additional sequences are added into the RDMS, a process is initiated which modifies existing templates or creates new templates from works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences themselves. After the new sequences have been assigned to templates, the templates can be merged into bins. If multiple templates exist in one bin, the bin can be spHt and the templates reannotated.
Once gene bins have been generated based upon sequence aHgnments, bins are "clone joined" based upon clone information. Clone joining occurs when the 5' sequence of one clone is present in one bin and the 3' sequence from the same clone is present in a different bin, indicating that the two bins should be merged into a single bin. Only bins which share at least two different clones are merged.
A resultant template sequence may contain either a partial or a fuU length open reading frame, or aU or part of a genetic regulatory element. This variation is due in part to the fact that the fuU length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. With current technology, cDNAs comprising the coding regions of large genes cannot be cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete "second strand" synthesis. Template sequences maybe extended to include additional contiguous sequences derived from the parent RNA transcript using a variety of methods known to those of skiU in the art.
Extension may thus be used to achieve the full length coding sequence of a gene.
Analysis of the cDNA Sequences
The cDNA sequences are analyzed using a variety of programs and algorithms which are weU known in the art. (See, e.g., Ausubel, 1997, supra. Chapter 7.7; Meyers, RA. (Ed.) (1995)
Molecular Biology and Biotechnolo y. Wiley VCH, New York NY, pp. 856-853 ; and Table 8.) These analyses comprise both reading frame determinations, e.g., based on triplet codon periodicity for particular organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and stop codons; and homology searches. Computer programs known to those of skiU in the art for performing computer-assisted searches for amino acid and nucleic acid sequence similarity, include, for example, Basic Local AHgnment Search Tool (BLAST; Altschul, S.F. (1993) J. Mol. Evol. 36:290-300; Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403-410). BLAST is especiaUy useful in determining exact matches and comparing two sequence fragments of arbitrary but equal lengths, whose aHgnment is locaUy maximal and for which the aHgnment score meets or exceeds a threshold or cutoff score set by the user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845). Using an appropriate search tool (e.g., BLAST or HMM), GenBank, SwissProt, BLOCKS, PFAM and other databases may be searched for sequences containing regions of homology to a query dithp or DITHP of the present invention.
Other approaches to the identification, assembly, storage, and display of nucleoti.de and polypeptide sequences are provided in "Relational Database for Storing Biomolecule Information," U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based FuU-Length Biomolecular Sequence Database," U.S.S.N. 08/811,758, filed March 6, 1997; and "Relational Database and System for Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, aU of which are incorporated by reference herein in their entirety.
Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference.
Identification of Human Diagnostic and Therapeutic Molecules Encoded by dithp
The identities of the DITHP encoded by the dithp of the present invention were obtained by analysis of the assembled cDNA sequences. SEQ ID NO:57 and SEQ ID NO:58, encoded by SEQ ID NO.l and SEQ ID NO:2, respectively, are, for example, human enzyme molecules.
SEQ ID NO:59, SEQ ID NO:60, and SEQ ID NO:61, encoded by SEQ JD NO:3, SEQ ID NO:4, and SEQ TD NO:5, respectively, are, for example, receptor molecules.
SEQ ID NO:62 and SEQ TD NO:63, encoded by SEQ ID NO:6 and SEQ JD NO:7, respectively, are, for example, intraceUular signaling molecules.
SEQ JD NO:64, SEQ JD NO:65, SEQ JD NO:66, SEQ JD NO:67, SEQ JD NO:68, SEQ ID NO:69, and SEQ TD NO:70, encoded by SEQ ID NO:8, SEQ TD NO:9, SEQ ID NO:10, SEQ TD NO-.ll, SEQ TD NO:12, SEQ ID NO:13, and SEQ ID NO:14, respectively, are, for example, transcription factor molecules. SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:74, SEQ ID NO:75, SEQ ID
NO:76, SEQ TD NO:77, SEQ TD NO:78, SEQ TD NO:79, SEQ ID NO:80, SEQ TD NO:81, SEQ ID NO:82, SEQ TD NO:83, SEQ ID NO:84, SEQ ID NO:85, SEQ TD NO:86, SEQ TD NO:87, and SEQ TD NO:88, encoded by SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ TD NO:18, SEQ TD NO:19, SEQ ID NO:20, SEQ TD NO:21, SEQ TD NO:22, SEQ TD NO:23, SEQ TD NO:24, SEQ ID NO:25, SEQ TD NO:26, SEQ TD NO:27, SEQ TD NO:28, SEQ TD NO:29, SEQ TD NO:30, SEQ TD NO:31, and SEQ ID NO:32, respectively, are, for example, Zn finger-type transcriptional regulators.
SEQ ID NO:89 and SEQ JD NO:90, encoded by SEQ ID NO:33 and SEQ ID NO:34, respectively, are, for example, membrane transport molecules. SEQ ID NO:91, SEQ TD NO:92, SEQ TD NO:93, and SEQ TD NO:94, encoded by SEQ TD
NO:35, SEQ TD NO:36, SEQ TD NO:37, and SEQ ID NO:38, respectively, are, for example, protein modification and maintenance molecules.
SEQ ID NO:95, encoded by SEQ ID NO:39 is, for example, an adhesion molecule.
SEQ ID NO:96 and SEQ ID NO:97, encoded by SEQ TD NO:40 and SEQ TD NO:41, respectively, are, for example, antigen recognition molecules.
SEQ ID NO:98, encoded by SEQ ID NO:42 is, for example, an electron transfer associated molecule.
SEQ ID NO:99 and SEQ ID NO OO, encoded by SEQ TD NO:43 and SEQ ID NO:44, respectively, are, for example, cytoskeletal molecules. SEQ ID NO:101 and SEQ ID NO:102, encoded by SEQ TD NO:45 and SEQ TD NO:46, respectively, are, for example, human ceU membrane molecules.
SEQ ID NO:103, SEQ TD NO:104, SEQ TD NO:105, SEQ ED NO:106, and SEQ ID NO:107, encoded by SEQ ID NO:47, SEQ TD NO:48, SEQ ED NO:49, SEQ ID NO:50, and SEQ ED NO:50, respectively, are, for example, organeUe associated molecules. SEQ ID NO:108 and SEQ ID NO:109, encoded by SEQ ID NO:51 and SEQ ID NO:52, respectively, are, for example, biochemical pathway molecules.
SEQ ID NO:110, SEQ ID NO:lll, SEQ ID NO:112, and SEQ ID NO:113, encoded by SEQ ID NO:53, SEQ ID NO:54, SEQ TD NO:55, and SEQ ID NO:56, respectively, are, for example, molecules associated with growth and development.
Sequences of Human Diagnostic and Therapeutic Molecules
The dithp of the present invention may be used for a variety of diagnostic and therapeutic purposes. For example, a dithp may be used to diagnose a particular condition, disease, or disorder associated with human molecules. Such conditions, diseases, and disorders include, but are not limited to, a cell proliterative disorder, such as actinic Keratosis, arteπosclerosis, atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and, in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gaU bladder, gangHa, gastrointestinal tract, heart, kidney, Hver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, saHvary glands, skin, spleen, testis, thymus, thyroid, and uterus; an autoimmune/inflammatory disorder, such as inflammation, actinic keratosis, acquired immunodeficiency syndrome (AIDS), Addison's disease, adult respiratory distress syndrome, aUergies, ankylosing spondyHtis, amyloidosis, anemia, arteriosclerosis, asthma, atherosclerosis, autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, bursitis, cholecystitis, cinhosis, contact dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes meUitus, emphysema, erythroblastosis fetaHs, erythema nodosum, atrophic gastritis, glomerulonephritis, Goodpastare's syndrome, gout, Graves' disease, Hasliimoto's thyroiditis, paroxysmal nocturnal hemoglobinuria, hepatitis, hypereosinophilia, irritable bowel syndrome, episodic lymphopenia with lymphocytotoxins, mixed connective tissue disease (MCTD), multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, myelofibrosis, osteoarthritis, osteoporosis, pancreatitis, polycythemia vera, polymyositis, psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, Sjogren's syndrome, systemic anaphylaxis, systemic lupus erythematosus, systemic sclerosis, primary thrombocythemia, thrombocytopenic purpura, ulcerative coHtis, uveitis, Werner syndrome, compHcations of cancer, hemodialysis, and extracorporeal circulation, trauma, and hematopoietic cancer including lymphoma, leukemia, and myeloma; an infection caused by a viral agent classified as adenovirus, arenavirus, bunyavirus, caHcivirus, coronavirus, filovirus, hepadnavirus, herpesvirus, flavivirus, ortiiomyxovirus, parvovirus, papovavirus, paramyxovirus, picornavirus, poxvirus, reovirus, retrovirus, rhabdovirus, or togavirus; an infection caused by a bacterial agent classified as pneumococcus, staphylococcus, streptococcus, baciUus, corynebacterium, clostridium, meningococcus, gonococcus, Hsteria, moraxeUa, kingeUa, haemophilus, legioneUa, bordeteUa, gram-negative enterobacterium including shigeUa, salmoneHa, or campylobacter, pseudomonas, vibrio, bruceUa, franciseUa, yersinia, bartoneUa, norcardium, actinomyces, mycobacterium, spirochaetale, rickettsia, chlamydia, or mycoplasma; an infection caused by a fungal agent classified as aspergiUus, blastomyces, dermatophytes, cryptococcus, coccidioides, malasezzia, histoplasma, or other mycosis-causing fungal agent; and an infection caused by a parasite classified as plasmodium or malaria-causing, parasitic entamoeba, leishmania, trypanosoma, toxoplasma, pneumocystis carinϋ, intestinal protozoa such as giardia, trichomonas, tissue nematode such as trichineUa, intestinal nematode such as ascaris, lymphatic filarial nematode, trematode such as schistosoma, and cestrode such as tapeworm; a developmental disorder such as renal tabular acidosis, anemia, Cushing's syndrome, achondrόplastic dwarfisήiπ 'cheήhe and Becker muscular dystrophy, epilepsy, gonadal dysgenesis, WAGR syndrome (Wilms' tamor, aniridia, genitourinary abnormaHties, and mental retardation), Smith-Magenis syndrome, myelodysplastic syndrome, hereditary mucoepitheHal dysplasia, hereditary keratodermas, hereditary neuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis, hypothyroidism, hydrocephalus, seizure disorders such as Syndenham's chorea and cerebral palsy, spina bifida, anencephaly, craniorachischisis, congenital glaucoma, cataract, and sensorineural hearing loss; an endocrine disorder such as a disorder of the hypothalamus and/or pituitary resulting from lesions such as a primary brain tamor, adenoma, infarction associated with pregnancy, hypophysectomy, aneurysm, vascular malformation, tiirombosis, infection, immunological disorder, and compHcation due to head trauma; a disorder associated with hypopitaitarism including hypogonadism, Sheehan syndrome, diabetes insipidus, KaUman's disease, Hand-SchuUer-Christian disease, Letterer-Siwe disease, sarcoidosis, empty seHa syndrome, and dwarfism; a disorder associated with hyperpitaitarism including acromegaly, giantism, and syndrome of inappropriate antidiuretic hormone (ADH) secretion (SIADH) often caused by benign adenoma; a disorder associated with hypothyroidism including goiter, myxedema, acute thyroiditis associated with bacterial infection, subacute thyroiditis associated with viral infection, autoimmune thyroiditis (Hashimoto's disease), and cretinism; a disorder associated with hyperfhyroidism including thyrotoxicosis and its various forms, Grave's disease, pretibial myxedema, toxic multinodular goiter, thyroid carcinoma, and Plummer's disease; a disorder associated with hyperparafhyroidism including Conn disease (chronic hypercalemia); a pancreatic disorder such as Type I or Type H diabetes meUitus and associated compHcations; a disorder associated with the adrenals such as hyperplasia, carcinoma, or adenoma of the adrenal cortex, hypertension associated with alkalosis, amyloidosis, hypokalemia, Cushing's disease, Liddle's syndrome, and Arnold-Healy- Gordon syndrome, pheochromocytoma tumors, and Addison's disease; a disorder associated with gonadal steroid hormones such as: in women, abnormal prolactin production, infertiHty, endometriosis, perturbation of the menstrual cycle, polycystic ovarian disease, hyperprolactinemia, isolated gonadotropin deficiency, amenorrhea, galactonhea, hermaphroditism, hirsutism and viriHzation, breast cancer, and, in post-menopausal women, osteoporosis; and, in men, Leydig ceU deficiency, male climacteric phase, and germinal ceU aplasia, a hypergonadal disorder associated with Leydig ceU tumors, androgen resistance associated with absence of androgen receptors, syndrome of 5 α- reductase, and gynecomastia; a metaboHc disorder such as Addison's disease, cerebrotendinous xanthomatosis, congenital adrenal hyperplasia, coumarin resistance, cystic fibrosis, diabetes, fatty hepatocinhosis, fructose- 1,6-diphosphatase deficiency, galactosemia, goiter, glucagonoma, glycogen storage diseases, hereditary fructose intolerance, hyperadrenaHsm, hypoadrenaHsm, hyperparafhyroidism, hypoparathyroidism, hypercholesterolemia, hyperfhyroidism, hypoglycemia, hypothyroidism, hyperHpidemia, hyperkpemia, Hpid myopathies, Hpodystrophies, lysosomal storage diseases, mannosidosis, neuraminidase deficiency, obesity, pentosuria phenylketonuria, pseudovitamin D-deficiency rickets; disorders of carbohydrate metaboHsm such as congenital type II dyserythropoietic anemia, diabetes, insulin-dependent diabetes meUitus, non-insulin-dependent diabetes meUitus, fructose-l,6-diphosphatase deficiency, galactosemia, glucagonoma, hereditary fructose intolerance, hypoglycemia, mannosidosis, neuraminidase deficiency, obesity, galactose epimerase deficiency, glycogen storage diseases, lysosomal storage diseases, fructosuria, pentosuria, and inherited abnormaHties of pyruvate metaboHsm; disorders of Hpid metaboHsm such as fatty Hver, cholestasis, primary bUiary cirrhosis, carnitine deficiency, carnitine paHnitoyltransferase deficiency, myoadenylate deaminase deficiency, hypertriglyceridemia, Hpid storage disorders such Fabry's disease, Gaucher's disease, Niemann-Pick' s disease, metachromatic leukodystrophy, adrenoleukodystrophy, GM2 gangHosidosis, and ceroid Hpofuscinosis, abetaHpoproteinemia, Tangier disease, hyperHpoproteinemia, diabetes meUitus, Hpodystrophy, Hpomatoses, acute pannicuHtis, disseminated fat necrosis, adiposis dolorosa, Hpoid adrenal hyperplasia, minimal change disease, Hpomas, atherosclerosis, hypercholesterolemia, hypercholesterolemia with hypertriglyceridemia, primary hypoalphaHpoproteinemia, hypothyroidism, renal disease, Hver disease, lecithin:cholesterol acyltransferase deficiency, cerebrotendinous xanthomatosis, sitosterolemia, hypocholesterolemia, Tay- Sachs disease, Sandhoff s disease, hyperHpidemia, hyperHpemia, Hpid myopathies, and obesity; and disorders of copper metaboHsm such as Menke's disease, Wilson's disease, and Ehlers-Danlos syndrome type LX; a neurological disorder such as epilepsy, ischemic cerebrovascular disease, stroke, cerebral neoplasms, Alzheimer's disease, Pick's disease, Huntington's disease, dementia, Parkinson's disease and other extrapyramidal disorders, amyotrophic lateral sclerosis and other motor neuron disorders, progressive neural muscular atrophy, retinitis pigmentosa, hereditary ataxias, multiple sclerosis and other demyelinating diseases, bacterial and viral meningitis, brain abscess, subdural empyema, epidural abscess, suppurative intracranial thrombophlebitis, myeHtis and radicuHtis, viral central nervous system disease, prion diseases including kuru, Creutzfeldt- Jakob disease, and Gerstmann-Straussler-Scheinker syndrome, fatal famiUal insomnia, nutritional and metaboHc diseases of the nervous system, neurofibromatosis, tuberous sclerosis, cerebeUoretinal hemangioblastomatosis, encephalotrigeminal syndrome, mental retardation and other developmental disorder of the central nervous system, cerebral palsy, a neuroskeletal disorder, an autonomic nervous system disorder, a cranial nerve disorder, a spinal cord disease, muscular dystrophy and other neuromuscular disorder, a peripheral nervous system disorder, dermatomyositis and polymyositis, inherited, metaboHc, endocrine, and toxic myopathy, myasthenia gravis, periodic paralysis, a mental disorder including mood, anxiety, and schizophrenic disorders, seasonal affective disorder (SAD), akathesia, amnesia, catatonia, diabetic neuropathy, tardive dyskinesia, dystonias, paranoid psychoses, posfherpetic neuralgia, and Tourette's disorder; a gastrointestinal disorder including ulcerative coHtis, gastric and duodenal ulcers, cystinuria, dϊbasicaminoaciduria, hypercystinuria, lysinuria, hartnup disease, tryptophan malabsorption, methionine malabsorption, histidinuria, iminoglycinuria, dicarboxyHcaminoaciduria, cystinosis, renal glycosuria, hypouricemia, famiHal hypophophatemic rickets, congenital chloridorrhea, distal renal tubular acidosis, Menkes' disease, Wilson's disease, lethal diarrhea, juvenile pernicious anemia, folate malabsorption, adrenoleukodystrophy, hereditary myoglobinuria, and Zellweger syndrome; a transport disorder such as akinesia, amyotrophic lateral sclerosis, ataxia telangiectasia, cystic fibrosis, Becker's muscular dystrophy, BeU's palsy, Charcot-Marie Tooth disease, diabetes meUitus, diabetes insipidus, diabetic neuropathy, Duchenne muscular dystrophy, hyperkalemic periodic paralysis, normokalemic periodic paralysis, Parkinson's disease, maHgnant hyperthermia, multidrug resistance, myasthenia gravis, myotonic dystrophy, catatonia, tardive dyskinesia, dystonias, peripheral neuropathy, cerebral neoplasms, prostate cancer, cardiac disorders associated with transport, e.g., angina, bradyarrytbmia, tachyarrythmia, hypertension, Long QT syndrome, myocarditis, cardiomyopathy, nemaline myopathy, centronuclear myopathy, Hpid myopathy, mitochondrial myopathy, thyrotoxic myopathy, ethanol myopathy, dermatomyositis, inclusion body myositis, infectious myositis, and polymyositis, neurological disorders associated with transport, e.g., Alzheimer's disease, amnesia, bipolar disorder, dementia, depression, epilepsy, Tourette's disorder, paranoid psychoses, and schizophrenia, and other disorders associated with transport, e.g., neurofibromatosis, posfherpetic neuralgia, trigeminal neuropathy, sarcoidosis, sickle cell anemia, cataracts, infertiHty, pulmonary artery stenosis, sensorineural autosomal deafness, hyperglycemia, hypoglycemia, Grave's disease, goiter, glucose-galactose malabsorption syndrome, hypercholesterolemia, Gushing' s disease, and Addison's disease; and a connective tissue disorder such as osteogenesis imperfecta, Ehlers-Danlos syndrome, chondrodysplasias, Marfan syndrome, Alport syndrome, famiHal aortic aneurysm, achondroplasia, mucopolysaccharidoses, osteoporosis, osteopetrosis, Paget's disease, rickets, osteomalacia, hyperparathyroidism, renal osteodystrophy, osteonecrosis, osteomyeHtis, osteoma, osteoid osteoma, osteoblastoma, osteosarcoma, osteochondroma, chondroma, chondroblastoma, chondromyxoid fibroma, chondrosarcoma, fibrous cortical defect, nonossifying fibroma, fibrous dysplasia, fibrosarcoma, maHgnant fibrous histiocytoma, Ewing's sarcoma, primitive neuroectodermal tamor, giant ceH tamor, osteoarthritis, rheumatoid arthritis, ankylosing spondyloarthritis, Reiter's syndrome, psoriatic arthritis, enteropathic arthritis, infectious arthritis, gout, gouty arthritis, calcium pyrophosphate crystal deposition disease, gangHon, synovial cyst, viUonodular synovitis, systemic sclerosis, Dupuytren's contracture, hepatic fibrosis, lupus erythematosus, mixed connective tissue disease, epidermolysis buUosa simplex, buUous congenital ichthyosiform erythroderma (epidermolytic hyperkeratosis), non-epidermolytic and epidermolytic palmoplantar keratoderma, ichthyosis buUosa of Siemens, pachyonychia congenita, and white sponge nevus. The dithp can be used to detect the presence of, or to quantify the amount of, a dithp-related polynucleotide in a sample. This information is then compared to information obtained from appropriate reference samples, and a diagnosis is estabHshed. Alternatively, a polynucleotide complementary to a given dithp can inhibit or inactivate a therapeuticaUy relevant gene related to the dithp.
Analysis of dithp Expression Patterns
The expression of dithp may be routinely assessed by hybridization-based methods to determine, for example, the tissue-specificity, disease-specificity, or developmental stage-specificity of dithp expression. For example, the level of expression of dithp may be compared among different ceU types or tissues, among diseased and normal ceU types or tissues, among ceH types or tissues at different developmental stages, or among ceU types or tissues undergoing various treatments. This type of analysis is useful, for example, to assess the relative levels of ditiip expression in fuUy or partiaUy differentiated ceUs or tissues, to determine if changes in dithp expression levels are conelated with the development or progression of specific disease states, and to assess the response of a ceU or tissue to a specific therapy, for example, in pharmacological or toxicological stadies. Methods for die analysis of dithp expression are based on hybridization and ampHfication technologies and include membrane-based procedures such as northern blot analysis, high-throughput procedures that utiHze, for example, microarrays, and PCR-based procedures.
Hybridization and Genetic Analysis
The dithp, their fragments, or complementary sequences, maybe used to identify the presence of and/or to determine the degree of similarity between two (or more) nucleic acid sequences. The dithp maybe hybridized to nataraUy occuning or recombinant nucleic acid sequences under appropriately selected temperatures and salt concentrations. Hybridization with a probe based on the nucleic acid sequence of at least one of the dithp aUows for the detection of nucleic acid sequences, including genomic sequences, which are identical or related to the dithp of the Sequence Listing. Probes may be selected from non-conserved or unique regions of at least one of the polynucleotides of SEQ ID NO:l-56 and tested for their abiHty to identify or ampHfy the target nucleic acid sequence using standard protocols. Polynucleotide sequences that are capable of hybridizing, in particular, to those shown in SEQ
ID NO:l-56 and fragments tiiereof, can be identified using various conditions of stringency. (See, e.g., Wahl, G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) Methods Enzymol. 152:507-511.) Hybridization conditions are discussed in 'Definitions."
A probe for use in Southern or northern hybridization may be derived from a fragment of a dithp sequence, or its complement, that is up to several hundred nucleotides in length and is either single-stranded or double-stranded. Such probes may be hybridized in solution to biological materials such as plasmids, bacterial, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to artificial substrates containing dithp. Microarrays are particularly suitable for identifying the presence of and detecting the level of expression for multiple genes of interest by examining gene expression 5 correlated with, e.g., various stages of development, treatment with a drug or compound, or disease progression. An array analogous to a dot or slot blot may be used to anange and link polynucleotides to the surface of a substrate using one or more of the foUowing: mechanical (vacuum), chemical, thermal, or UV bonding procedures. Such an anay may contain any number of dithp and may be produced by hand or by using available devices, materials, and machines. 0 Microarrays may be prepared, used, and analyzed using methods known in the art. (See, e.g.,
Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93:10614-10619; Baldeschweiler et al. (1995) PCT appHcation W095/251116; Shalon, D. et al. (1995) PCT appHcation WO95/35505; HeUer, RA. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150-2155; and HeUer, MJ. et al. (1997) U.S. Patent No. 5,605,662.) 5 Probes may be labeled by either PCR or enzymatic techniques using a variety of commerciaUy available reporter molecules. For example, commercial kits are available for radioactive and chemiluminescent labeling (Amersham Pharmacia Biotech) and for alkaline phosphatase labeling (Life Technologies). Alternatively, dithp maybe cloned into commerciaUy available vectors for the production of RNA probes. Such probes may be transcribed in the presence of at least one labeled o nucleotide (e.g. , 32P-ATP, Amersham Pharmacia Biotech).
AdditionaUy the polynucleotides of SEQ ID NO:l-56 or suitable fragments thereof can be used to isolate fuU length cDNA sequences utilizing hybridization and/or ampHfication procedures weU known in the art, e.g., cDNA Hbrary screening, PCR ampHfication, etc. The molecular cloning of such full length cDNA sequences may employ the method of cDNA Hbrary screening with probes using the 5 hybridization, stringency, washing, and probing strategies described above and in Ausubel, supra. Chapters 3, 5, and 6. These procedures may also be employed with genomic Hbraries to isolate genomic sequences of dithp in order to analyze, e.g., regulatory elements. Genetic Mapping
Gene identification and mapping are important in the investigation and treatment of almost aU conditions, diseases, and disorders. Cancer, cardiovascular disease, Alzheimer's disease, arthritis, o diabetes, and mental illnesses are of particular interest. Each of these conditions is more complex than the single gene defects of sickle ceU anemia or cystic fibrosis, with select groups of genes being predictive of predisposition for a particular condition, disease, or disorder. For example, cardiovascular disease may result from malfunctioning receptor molecules that fail to clear cholesterol from the bloodstream, and diabetes may result when a particular individual's immune system is 5 activated by an infection and attacks the insuHn-producing ceUs of the pancreas. In some studies', Alzheimer's disease has been Jinked to a gene on chromosome 21; other stadies predict a different gene and location. Mapping of disease genes is a complex and reiterative process and generaUy proceeds from genetic linkage analysis to physical mapping.
As a condition is noted among members of a family, a genetic linkage map traces parts of chromosomes that are inherited in the same pattern as the condition. Statistics link the inheritance of particular conditions to particular regions of chromosomes, as defined by RFLP or other markers. (See, for example, Lander, E. S. and Botstein, D. (1986) Proc. Natl. Acad. Sci. USA 83:7353-7357.) OccasionaUy, genetic markers and their locations are known from previous stadies. More often, however, the markers are simply stretches of DNA that differ among individuals. Examples of genetic linkage maps can be found in various scientific journals or at the Online MendeHan Inheritance in Man (OMIM) World Wide Web site.
In another embodiment of the invention, dithp sequences may be used to generate hybridization probes useful in chromosomal mapping of nataraUy occurring genomic sequences. Either coding or noncoding sequences of dithp may be used, and in some instances, noncoding sequences maybe preferable over coding sequences. For example, conservation of a dithp coding sequence among members of a multi-gene family may potentiaUy cause undesired cross hybridization during chromosomal mapping. The sequences may be mapped to a particular chromosome, to a specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), bacterial PI constructions, or single chromosome cDNA Hbraries. (See, e.g., Harrington, JJ. et al. (1997) Nat. Genet. 15:345-355; Price, CM. (1993) Blood Rev. 7:127-134; and Trask, B J. (1991) Trends Genet. 7:149-154.)
Fluorescent in situ hybridization (FISH) may be conelated with other physical chromosome mapping techniques and genetic map data. (See, e.g., Meyers, supra, pp. 965-968.) Correlation between the location of dithp on a physical chromosomal map and a specific disorder, or a predisposition to a specific disorder, may help define the region of DNA associated with that disorder. The dithp sequences may also be used to detect polymorphisms that are geneticaUy linked to the inheritance of a particular condition, disease, or disorder.
In situ hybridization of chromosomal preparations and genetic mapping techniques, such as linkage analysis using estabHshed chromosomal markers, may be used for extending existing genetic maps. Often the placement of a gene on the chromosome of another mammaHan species, such as mouse, may reveal associated markers even if the number or arm of the conesponding human chromosome is not known. These new marker sequences can be mapped to human chromosomes and may provide valuable information to investigators searching for disease genes using positional cloning or other gene discovery techniques. Once a disease or syndrome has been crudely conelated by genetic linkage with a particular genomic region, e.g., ataxia-telangiectasia to llq22-23, any sequences mapping to that area may represent associated or regulatory genes for further investigation. (See, e.g., Gatti, R.A. et al. (1988) Nature 336:577-580.) The nucleotide sequences of the subject invention may also be used to detect differences in chromosomal architecture due to translocation, 5 inversion, etc., among normal, carrier, or affected individuals.
Once a disease-associated gene is mapped to a chromosomal region, the gene must be cloned in order to identify mutations or other alterations (e.g., translocations or inversions) that may be correlated with disease. This process requires a physical map of the chromosomal region containing the disease-gene of interest along with associated markers. A physical map is necessary for o determining the nucleotide sequence of and order of marker genes on a particular chromosomal region. Physical mapping techniques are weU known in the art and require the generation of overlapping sets of cloned DNA fragments from a particular organeUe, chromosome, or genome. These clones are analyzed to reconstruct and catalog their order. Once the position of a marker is determined, the DNA from that region is obtained by consulting the catalog and selecting clones from 5 that region. The gene of interest is located through positional cloning techniques using hybridization or similar methods.
Diagnostic Uses
The dithp of the present invention may be used to design probes useful in diagnostic assays. o Such assays, weU known to those skiUed in the art, may be used to detect or confirm conditions, disorders, or diseases associated with abnormal levels of dithp expression. Labeled probes developed from dithp sequences are added to a sample under hybridizing conditions of desired stringency. In some instances, dithp, or fragments or oHgonucleotides derived from dithp, maybe used as primers in ampHfication steps prior to hybridization. The amount of hybridization complex formed is quantified 5 and compared with standards for that ceU or tissue. H dithp expression varies significantly from the standard, the assay indicates the presence of the condition, disorder, or disease. QuaHtative or quantitative diagnostic methods may include northern, dot blot, or other membrane or dip-stick based technologies or multiple-sample format technologies such as PCR, enzyme-linked immunosorbent assay (ELISA)-like, pin, or chip-based assays. o The probes described above may also be used to monitor the progress of conditions, disorders, or diseases associated with abnormal levels of dithp expression, or to evaluate the efficacy of a particular therapeutic treatment. The candidate probe maybe identified from the dithp that are specific to a given human tissue and have not been observed in GenBank or other genome databases. Such a probe may be used in animal stadies, precHnical tests, clinical trials, or in monitoring the 5 treatment of an individual patient. In a typical process, standard expression is estabHshed by methods weU known in the art for use as a basis of comparison, samples from patients affected by the disorder or disease are combined with the probe to evaluate any deviation from the standard profile, and a therapeutic agent is administered and effects are monitored to generate a treatment profile. Efficacy is evaluated by determining whether the expression progresses toward or returns to the standard normal pattern. Treatment profiles may be generated over a period of several days or several months. Statistical methods weU known to those skiUed in the art may be use to determine the significance of such therapeutic agents.
The polynucleotides are also useful for identifying individuals from minute biological samples, for example, by matching the RFLP pattern of a sample's DNA to that of an individual's DNA. The polynucleotides of the present invention can also be used to determine the actual base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be used to prepare PCR primers for ampHfying and isolating such selected DNA, which can then be sequenced. Using this technique, an individual can be identified through a unique set of DNA sequences. Once a unique ED database is estabHshed for an individual, positive identification of that individual can be made from extremely smaU tissue samples.
In a particular aspect, oHgonucleotide primers derived from the dithp of the invention may be used to detect single nucleotide polymorphisms (SNPs). SNPs are substitations, insertions and deletions that are a frequent cause of inherited or acquired genetic disease in humans. Methods of SNP detection include, but are not Hmited to, single-stranded conformation polymorphism (SSCP) and fluorescent SSCP (fSSCP) methods. In SSCP, oHgonucleotide primers derived from dithp are used to ampHfy DNA using the polymerase chain reaction (PCR). The DNA may be derived, for example, from diseased or normal tissue, biopsy samples, bodily fluids, and the like. SNPs in the DNA cause differences in the secondary and tertiary structures of PCR products in single-stranded form, and these differences are detectable using gel electrophoresis in non-denaturing gels. In fSCCP, the oHgonucleotide primers are fluorescently labeled, which aUows detection of the amplimers in high- throughput equipment such as DNA sequencing machines. AdditionaUy, sequence database analysis methods, termed in siHco SNP (isSNP), are capable of identifying polymorphisms by comparing the sequences of individual overlapping DNA fragments which assemble into a common consensus sequence. These computer-based methods filter out sequence variations due to laboratory preparation of DNA and sequencing enors using statistical models and automated analyses of DNA sequence chromatograms. In the alternative, SNPs may be detected and characterized by mass spectrometry using, for example, the high throughput MASSARRAY system (Sequenom, Inc., San Diego CA). DNA-based identification techniques are critical in forensic technology. DNA sequences taken from very smaU biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, saHva, semen, etc., can be ampHfied using, e.g., PCR, to identify individuals. (See, e.g., ErHch, H. ec no ogy, reeman an o., ew or , . mi ar y, po ynuc eo es o t e present invention can be used as polymorphic markers.
There is also a need for reagents capable of identifying the source of a particular tissue. Appropriate reagents can comprise, for example, DNA probes or primers prepared from the 5 sequences of the present invention that are specific for particular tissues. Panels of such reagents can identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to screen tissue cultures for contamination.
The polynucleotides of the present invention can also be used as molecular weight markers on nucleic acid gels or Southern blots, as diagnostic probes for the presence of a specific mRNA in a o particular ceU type, in the creation of subtracted cDNA Hbraries which aid in the discovery of novel polynucleotides, in selection and synthesis of oHgomers for attachment to an anay or other support, and as an antigen to eHcit an immune response.
Disease Model Systems Using dithp 5 The dithp of the invention or their mammaHan homologs may be "knocked out" in an animal model system using homologous recombination in embryonic stem (ES) ceHs. Such techniques are weU known in the art and are useful for the generation of animal models of human disease. (See, e.g., U.S. Patent Number 5,175,383 and U.S. Patent Number 5,767,337.) For example, mouse ES ceUs, such as the mouse 129/SvJ ceH Hne, are derived from the early mouse embryo and grown in culture. o The ES ceUs are transformed with a vector containing the gene of interest disrupted by a marker gene, e.g., the neomycin phosphotiansferase gene (neo; Capecchi, M.R. (1989) Science 244:1288-1292). The vector integrates into the corresponding region of the host genome by homologous recombination. Alternatively, homologous recombination takes place using the Cre-loxP system to knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J.D. (1996) CHn. Invest. 97:1999- 5 2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330). Transformed ES ceHs are identified and microinjected into mouse ceU blastocysts such as those from the C57BL/6 mouse strain. The blastocysts are surgicaUy transferred to pseudopregnant dams, and the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains. Transgenic animals thus generated may be tested with potential therapeutic or toxic agents. o The dithp of the invention may also be manipulated in vitro in ES ceUs derived from human blastocysts. Human ES ceUs have the potential to differentiate into at least eight separate ceU lineages including endoderm, mesoderm, and ectodermal ceU types. These ceU lineages differentiate into, for example, neural ceUs, hematopoietic lineages, and cardiomyocytes (Thomson, J.A. et al. (1998) Science 282:1145-1147). 5 The dithp of the invention can also be used to create "knockin" humanized animals (pigs) or ransgemc anima s mice or ra s o mo e uman isease. i nocαn ec nology, a region o i p is injected into animal ES ceUs, and the injected sequence integrates into the animal ceU genome. Transformed ceHs are injected into blastalae, and the blastalae are implanted as described above. Transgenic progeny or inbred Hnes are studied and treated with potential pharmaceutical agents to obtain information on treatment of a human disease. Alternatively, a mammal inbred to overexpress dithp, resulting, e.g., in the secretion of DITHP in its milk, may also serve as a convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74).
Screening Assays DITHP encoded by polynucleotides of the present invention may be used to screen for molecules that bind to or are bound by the encoded polypeptides. The binding of the polypeptide and the molecule may activate (agonist), increase, inhibit (antagonist), or decrease activity of the polypeptide or the bound molecule. Examples of such molecules include antibodies, oHgonucleotides, proteins (e.g., receptors), or smaU molecules. Preferably, the molecule is closely related to the natural Hgand of the polypeptide, e.g., a Hgand or fragment thereof, a natural substrate, or a structural or functional mimetic. (See, CoHgan et al., (1991) Current Protocols in Inimunology 1(2): Chapter 5.) Similarly, the molecule can be closely related to the natural receptor to which the polypeptide binds, or to at least a fragment of the receptor, e.g., the active site. In either case, the molecule can be rationaHy designed using known techniques. Preferably, the screening for these molecules involves producing appropriate ceUs which express the polypeptide, either as a secreted protein or on the ceH membrane. Preferred ceUs include ceUs from mammals, yeast, Drosophila, or E. coH. CeHs expressing the polypeptide or ceU membrane fractions which contain the expressed polypeptide are then contacted with a test compound and binding, stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed. An assay may simply test binding of a candidate compound to the polypeptide, wherein binding is detected by a fluorophore, radioisotope, enzyme conjugate, or other detectable label. Alternatively, the assay may assess binding in the presence of a labeled competitor.
AdditionaUy, the assay can be caπied out using ceU-free preparations, polypeptide/molecule affixed to a soHd support, chemical Hbraries, or natural product mixtures. The assay may also simply comprise the steps of mixing a candidate compound with a solution containing a polypeptide, measuring polypeptide/molecule activity or binding, and comparing the polypeptide/molecule activity or binding to a standard.
Preferably, an ELISA assay using, e.g., a monoclonal or polyclonal antibody, can measure polypeptide level in a sample. The antibody can measure polypeptide level by either binding, directly or indirectly, to the polypeptide or by competing with the polypeptide for a substrate. AU of the above assays can be used in a diagnostic or prognostic context. The molecules discovered using these assays can be used to treat disease or to bring about a particular result in a patient (e.g., blood vessel growth) by activating or inhibiting the polypeptide/molecule. Moreover, the assays can discover agents which may inhibit or enhance the production of the polypeptide from 5 suitably manipulated ceUs or tissues.
Transcript Imaging and Toxicological Testing
Another embodiment relates to the use of dithp to develop a transcript image of a tissue or ceU type. A transcript image represents the global pattern of gene expression by a particular tissue or ceU 0 type. Global gene expression patterns are analyzed by quantifying the number of expressed genes and their relative abundance under given conditions and at a given time. (See Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, expressly incorporated by reference herein.) Thus a transcript image may be generated by hybridizing the polynucleotides of the present invention or their complements to the totaHty of transcripts or reverse transcripts of a 5 particular tissue or ceU type. In one embodiment, the hybridization takes place in high-throughput format, wherein the polynucleotides of the present invention or their complements comprise a subset of a pluraHty of elements on a microanay. The resultant tianscript image would provide a profile of gene activity pertaining to human molecules for diagnostics and therapeutics.
Transcript images which profile dithp expression may be generated using transcripts isolated o from tissues, ceU lines, biopsies, or other biological samples. The transcript image may thus reflect dithp expression in vivo, as in the case of a tissue or biopsy sample, or in vitro, as in the case of a ceU Hne.
Transcript images which profile dithp expression may also be used in conjunction with in vitro model systems and precHnical evaluation of pharmaceuticals, as weU as toxicological testing of 5 industrial and nataraHy-occurring environmental compounds. AU compounds induce characteristic gene expression patterns, frequently termed molecular fingerprints or toxicant signatares, which are indicative of mechanisms of action and toxicity (Nuwaysir, E. F. et al. (1999) Mol. Carcinog. 24:153- 159; Steiner, S. and Anderson, N.L. (2000) Toxicol. Lett. 112-113:467-71, expressly incorporated by reference herein). If a test compound has a signature similar to that of a compound with known o toxicity, it is likely to share those toxic properties. These fingerprints or signatares are most useful and refined when they contain expression information from a large number of genes and gene famiHes. IdeaUy, a genome- wide measurement of expression provides the highest quaHty signature. Even genes whose expression is not altered by any tested compounds are important as weU, as the levels of expression of these genes are used to normaHze the rest of the expression data. The normaHzation 5 procedure is useful for comparison of expression data after treatment with different compounds. While the assignment of gene function to elements of a toxicant signature aids in interpretation of toxicity mechanisms, knowledge of gene function is not necessary for the statistical matching of signatares which leads to prediction of toxicity. (See, for example, Press Release 00-02 from the National Institute of Environmental Health Sciences, released February 29, 2000, available at 5 http://www.niehs.nih.gov/oc/news/toxchip.htm.) Therefore, it is important and deskable in toxicological screening using toxicant signatares to include aU expressed gene sequences.
In one embodiment, the toxicity of a test compound is assessed by treating a biological sample containing nucleic acids with the test compound. Nucleic acids that are expressed in the treated biological sample are hybridized with one or more probes specific to the polynucleotides of the present l o invention, so that transcript levels corresponding to the polynucleotides of the present invention may be quantified. The transcript levels in the treated biological sample are compared with levels in an untreated biological sample. Differences in the tianscript levels between the two samples are indicative of a toxic response caused by the test compound in the treated sample.
Another particular embodiment relates to the use of DITHP encoded by polynucleotides of
15 the present invention to analyze the proteome of a tissue or ceU type. The term proteome refers to the global pattern of protein expression in a particular tissue or ceU type. Each protein component of a proteome can be subjected individuaUy to further analysis. Proteome expression patterns, or profiles, are analyzed by quantifying the number of expressed proteins and their relative abundance under given conditions and at a given time. A profile of a ceU's proteome may thus be generated by
20 separating and analyzing the polypeptides of a particular tissue or ceU type. In one embodiment, the separation is achieved using two-dimensional gel electrophoresis, in which proteins from a sample are separated by isoelectric focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulfate slab gel electrophoresis in the second dimension (Steiner and Anderson, supra). The proteins are visuaHzed in the gel as discrete and uniquely positioned spots, typicaUy by
25 staining the gel with an agent such as Coomassie Blue or silver or fluorescent stains. The optical density of each protein spot is generaUy proportional to the level of the protein in the sample. The optical densities of equivalently positioned protein spots from different samples, for example, from biological samples either treated or untreated with a test compound or therapeutic agent, are compared to identify any changes in protein spot density related to the treatment. The proteins in the
3 o spots are partiaUy sequenced using, for example, standard methods employing chemical or enzymatic cleavage foUowed by mass spectrometry. The identity of the protein in a spot may be determined by comparing its partial sequence, preferably of at least 5 contiguous amino acid residues, to the polypeptide sequences of the present invention. In some cases, further sequence data may be obtained for definitive protein identification.
35 A proteomic profile may also be generated using antibodies specific for DITHP to quantify the levels of DITHP expression. In one embodiment, the antibodies are used as elements on a microarray, and protein expression levels are quantified by exposing the microaπay to the sample and detecting the levels of protein bound to each anay element (Lueking, A. et al. (1999) Anal. Biochem. 270:103-11; Mendoze, L.G. et al. (1999) Biotechniques 27:778-88). Detection maybe performed by a 5 variety of methods known in the art, for example, by reacting die proteins in the sample with a thiol- or amino-reactive fluorescent compound and detecting the amount of fluorescence bound at each array element.
Toxicant signatares at the proteome level are also useful for toxicological screening, and should be analyzed in paraUel with toxicant signatares at die transcript level. There is a poor o correlation between transcript and protein abundances for some proteins in some tissues (Anderson, N.L. and Seilhamer, J. (1997) Electrophoresis 18:533-537), so proteome toxicant signatares maybe useful in the analysis of compounds which do not significantly affect the transcript image, but which alter the proteomic profile. In addition, the analysis of transcripts in body fluids is difficult, due to rapid degradation of mRNA, so proteomic profiling may be more reHable and informative in such cases. 5 In another embodiment, the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins that are expressed in the treated biological sample are separated so that the amount of each protein can be quantified. The amount of each protein is compared to the amount of the conesponding protein in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the 0 test compound in the treated sample. Individual proteins are identified by sequencing the amino acid residues of the individual proteins and comparing these partial sequences to the DΓTHP encoded by polynucleotides of the present invention.
In another embodiment, the toxicity of a test compound is assessed by treating a biological sample containing proteins with the test compound. Proteins from the biological sample are incubated 5 with antibodies specific to the DITHP encoded by polynucleotides of the present invention. The amount of protein recognized by the antibodies is quantified. The amount of protein in the treated biological sample is compared with the amount in an untreated biological sample. A difference in the amount of protein between the two samples is indicative of a toxic response to the test compound in the treated sample. o Transcript images may be used to profile dithp expression in distinct tissue types. This process can be used to determine human molecule activity in a particular tissue type relative to this activity in a different tissue type. Transcript images may be used to generate a profile of dithp expression characteristic of diseased tissue. Transcript images of tissues before and after treatment may be used for diagnostic purposes, to monitor the progression of disease, and to monitor the efficacy 5 of drug treatments for diseases which affect the activity of human molecules. Transcript images of ceH Hnes can be used to assess human molecule activity and/or to identify ceH Hnes that lack or misregulate this activity. Such ceU Hnes may then be treated with pharmaceutical agents, and a transcript image foUowing treatment may indicate the efficacy of these agents in restoring desired levels of this activity. A similar approach may be used to assess the toxicity of pharmaceutical agents as reflected by undesirable changes in human molecule activity. Candidate pharmaceutical agents may be evaluated by comparing their associated transcript images with those of pharmaceutical agents of known effectiveness.
Antisense Molecules The polynucleotides of the present invention are useful in antisense technology. Antisense technology or therapy reHes on the modulation of expression of a target protein through the specific binding of an antisense sequence to a target sequence encoding the target protein or directing its expression. (See, e.g., Agrawal, S., ed. (1996) Antisense Therapeutics, Humana Press Inc., Totawa NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3):171-178; Crooke, S.T. (1997) Adv. Pharmacol. 40:1-49; Sharma, H.W. and R. Narayanan (1995) Bioessays 17(12):1055-1063; and Lavrosky, Y. et al. (1997) Biochem. Mol. Med. 62(l):ll-22.) An antisense sequence is a polynucleotide sequence capable of specificaUy hybridizing to at least a portion of the target sequence. Antisense sequences bind to ceUular mRNA and/or genomic DNA, affecting translation and/or transcription. Antisense sequences can be DNA, RNA, or nucleic acid mimics and analogs. (See, e.g., Rossi, J J. et al. (1991) Antisense Res. Dev. l(3):285-288; Lee, R. et al. (1998) Biochemistry 37(3):900-1010; Pardridge, W.M. et al. (1995) Proc. Natl. Acad. Sci. USA 92(12):5592-5596; and Nielsen, P. E. and Haaima, G. (1997) Chem. Soc. Rev. 96:73-78.) TypicaUy, the binding which results in modulation of expression occurs tiirough hybridization or binding of complementary base pairs. Antisense sequences can also bind to DNA duplexes through specific interactions in the major groove of the double heHx. The polynucleotides of the present invention and fragments thereof can be used as antisense sequences to modify the expression of the polypeptide encoded by dithp. The antisense sequences can be produced ex vivo, such as by using any of the ABI nucleic acid synthesizer series (AppHed Biosystems) or other automated systems known in the art. Antisense sequences can also be produced biologicaHy, such as by transforming an appropriate host ceU with an expression vector containing the sequence of interest. (See, e.g., Agrawal, supra.)
In therapeutic use, any gene deHvery system suitable for introduction of the antisense sequences into appropriate target ceUs can be used. Antisense sequences can be deHvered intraceUularly in the form of an expression plasmid which, upon transcription, produces a sequence complementary to at least a portion of the ceUular sequence encoding the target protein. (See, e.g., Slater, J.E., et al. (1998) J. AUergy CHn. Immunol. 102(3):469-475; and Scanlon, KJ., et al. (1995) 9(13):1288-1296.) Antisense sequences can also be introduced lntracellularly through the use of viral vectors, such as retrovirus and adeno-associated virus vectors. (See, e.g., MiUer, A.D. (1990) Blood 76:271; Ausubel, F.M. et al. (1995) Cunent Protocols in Molecular Biology, John Wiley & Sons, New York NY; Uckert, W. and W. Walther (1994) Pharmacol. Ther. 63(3):323-347.) Other gene deHvery 5 mechanisms include Hposome-derived systems, artificial viral envelopes, and other systems known in the art. (See, e.g., Rossi, JJ. (1995) Br. Med. BuU. 51(l):217-225; Boado, RJ. et al. (1998) J. Pharm. Sci. 87(11):1308-1315; and Morris, M . et al. (1997) Nucleic Acids Res. 25(14):2730-2736.)
Expression 0 In order to express a biologicaUy active DITHP, the nucleotide sequences encoding DITHP or fragments thereof may be inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for transcriptional and translational contiol of the inserted coding sequence in a suitable host. Methods which are1 weU known to those skiUed in the art may be used to construct expression vectors containing sequences encoding DITHP and appropriate transcriptional and 5 tianslational control elements. These methods include in vitio recombinant DNA techniques, synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, supra. Chapters 4, 8, 16, and 17; and Ausubel, supra. Chapters 9, 10, 13, and 16.)
A variety of expression vector/host systems may be utilized to contain and express sequences encoding DITHP. These include, but are not Hmited to, microorganisms such as bacteria transformed o with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect ceU systems infected with viral expression vectors (e.g., baculovirus); plant ceU systems transformed with viral expression vectors (e.g., cauliflower mosaic virus, CaMV, or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal (mammaHan) ceU systems. (See, e.g., Sambrook, supra; Ausubel, 1995, supra. Nan Heeke, G. 5 and S.M. Schuster (1989) J. Biol. Chem. 264:5503-5509; Bitter, G.A. et al. (1987) Methods Enzymol. 153:516-544; Scorer, CA. et al. (1994) Bio/Technology 12:181-184; Engelhard, E.K. et al. (1994) Proc. Νatl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996) Hum. Gene Ther. 7:1937-1945; Takamatsu, Ν. (1987) EMBO J. 6:307-311; Coruzzi, G. et al. (1984) EMBO J. 3:1671-1680; BrogHe, R. et al. (1984) Science 224:838-843; Winter, J. et al. (1991) Results Probl. CeH Differ. 17:85-105; o The McGraw HiU Yearbook of Science and Technology (1992) McGraw HiU, New York NY, pp.
191-196; Logan, J. and T. Shenk (1984) Proc. Natl. Acad. Sci. USA 81:3655-3659; and Harrington, JJ. et al. (1997) Nat. Genet. 15:345-355.) Expression vectors derived from retroviruses, adenoviruses, or herpes or vaccinia viruses, or from various bacterial plasmids, may be used for deHvery of nucleotide sequences to the targeted organ, tissue, or ceU population. (See, e.g., Di Nicola, 5 M. et al. (1998) Cancer Gen. Ther. 5(6):350-356; Yu, M. et al., (1993) Proc. Natl. Acad. Sci. USA 90(13):6340-6344; BuUer, R.M. et al. (1985) Nature 317(6040):813-815; McGregor, D.P. et al. (1994)
Mol. Immunol. 31(3):219-226; and Verma, LM. and N. Somia (1997) Nature 389:239-242.) The invention is not Hmited by the host ceU employed.
For long term production of recombinant proteins in mammaHan systems, stable expression of DITHP in ceH Hnes is preferred. For example, sequences encoding DITHP can be transformed into ceU Hnes using expression vectors which may contain viral origins of repHcation and/or endogenous expression elements and a selectable marker gene on the same or on a separate vector. Any number of selection systems maybe used to recover transformed ceU Hnes. (See, e.g., Wigler, M. et al. (1977) CeU 11:223-232; Lowy, I. et al. (1980) CeU 22:817-823.; Wigler, M. et al. (1980) Proc. Natl. 0 Acad. Sci. USA 77:3567-3570; Colbere-Garapin, F. et al. (1981) J. Mol. Biol. 150:1-14; Hartman, S.C. and RCMuUigan (1988) Proc. Natl. Acad. Sci. USA 85:8047-8051; Rhodes, CA. (1995) Methods Mol. Biol. 55:121-131.)
Therapeutic Uses of dithp 5 The dithp of the invention may be used for somatic or germline gene therapy. Gene therapy maybe performed to (i) conect a genetic deficiency (e.g., in the cases of severe combined immunodeficiency (SCBD)-Xl disease characterized by X-Hhked inheritance (Cavazzana-Calvo, M. et al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; o Bordignon, C et al. (1995) Science 270:470-475), cystic fibrosis (Zabner, J. et al. (1993) CeU 75:207- 216; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:643-666; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:667-703), thalassemias, famiHal hypercholesterolemia, and hemophilia resulting from Factor VDI or Factor LX deficiencies (Crystal, R.G. (1995) Science 270:404-410; Verma, LM. and Somia, N. (1997) Nature 389:239-242)), (n) express a conditionaUy lethal gene product (e.g., in the case of 5 cancers which result from unregulated ceU proHferation), or (in) express a protein which affords protection against intraceUular parasites (e.g., against human retroviruses, such as human immunodeficiency virus (HIV) (Baltimore, D. (1988) Nature 335:395-396; Poeschla, E. et al. (1996) Proc. Natl. Acad. Sci. USA. 93:11395-11399), hepatitis B or C virus (HBV, HCV); fungal parasites, such as Candida albicans and Paracoccidioides brasiHensis; and protozoan parasites such as o Plasmodium falciparum and Trypanosoma cruzi). In the case where a genetic deficiency in dithp expression or regulation causes disease, the expression of dithp from an appropriate population of transduced ceUs may aUeviate the clinical manifestations caused by the genetic deficiency.
In a further embodiment of the invention, diseases or disorders caused by deficiencies in dithp are treated by constructing mammaHan expression vectors comprising dithp and introducing these 5 vectors by mechanical means into dithp-deficient ceUs. Mechanical transfer technologies for use with ceHs in vivo or ex vitro include (i) direct DNA microinjection into individual ceUs, (H) ballistic gold particle deHvery, (iii) Hposome-mediated transfection, (iv) receptor-mediated gene transfer, and (v) the use of DNA transposons (Morgan, RA. and Anderson, W.F. (1993) Annu. Rev. Biochem. 62:191- 217; Ivies, Z. (1997) CeU 91:501-510; Boulay, J-L. and Recipon, H. (1998) Curr. Opin. Biotechnol. 9:445-450).
Expression vectors that may be effective for the expression of dithp include, but are not Hmited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP, PVAX vectors (Invitrogen, Carlsbad CA), PCMV-SC TPT, PCMV-TAG, PEGSH PERV (Stratagene, La JoUa CA), and PTET-OFF, PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG (Clontech, Palo Alto CA). The dithp of the invention maybe expressed using (i) a constitatively active promoter, (e.g., from cytomegalovirus (CMV), Rous sarcoma virus (RSV), SN40 virus, thymidine kinase (TK), or β-actin genes), (n) an inducible promoter (e.g., the tetracycline-regulated promoter (Gossen, M. and Bujard, H. (1992) Proc. Νatl. Acad. Sci. U.S.A. 89:5547-5551; Gossen, M. et al., (1995) Science 268:1766-1769; Rossi, F.M.N and Blau, H.M. (1998) Curr. Opin. Biotechnol. 9:451-456), commerciaUy available in the T-REX plasmid (Invitrogen); the ecdysone-inducible promoter (available in the plasmids PNGRXR and PIΝD;
Invitrogen); die FK506/rapamycin inducible promoter; or the RU486/mifepristone inducible promoter (Rossi, F.M.V. and Blau, H.M. supra), or (Hi) a tissue-specific promoter or the native promoter of the endogenous gene encoding DITHP from a normal individual.
CommerciaUy available Hposome transformation kits (e.g., the PERFECT LEPED TRANSFECTION KIT, available from Invitrogen) aUow one with ordinary skiU in the art to deHver polynucleotides to target ceHs in culture and require minimal effort to optimize experimental parameters. In the alternative, transformation is performed using the calcium phosphate method (Graham, F.L. and Eb, A J. (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J. 1:841-845). The introduction of DNA to primary ceUs requires modification of these standardized mammaHan transfection protocols.
In another embodiment of the invention, diseases or disorders caused by genetic defects with respect to dithp expression are treated by constructing a retrovirus vector consisting of (i) ditiip under the control of an independent promoter or the retrovirus long terminal repeat (LTR) promoter, (n) appropriate RNA packaging signals, and (Hi) a Rev-responsive element (RRE) along with additional retrovirus cw-acting RNA sequences and coding sequences required for efficient vector propagation.
Rettovirus vectors (e.g., PFB and PFBNEO) are commerciaUy available (Stratagene) and are based onpubHshed data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. U.S.A. 92:6733-6737), incorporated by reference herein. The vector is propagated in an appropriate vector producing ceU Hne (VPCL) that expresses an envelope gene with a tropism for receptors on the target ceUs or a promiscuous envelope protein such as VSVg (Armentano, D. et al. (1987) J. Virol. 61:1647-1650; Bender, M.A. et al. (1987) J. Virol. 61:1639-1646; Adam, M.A. and Mffier, A.D. (1988) J. Virol. 62:3802-3806; DuU, T. et al. (1998) J. Virol. 72:8463-8471; Zufferey, R. et al. (1998) J. Virol. 72:9873-9880). US. Patent Number 5,910,434 to Rigg ("Method for obtaining retrovirus packaging ceU Hnes producing high transducing efficiency retroviral supernatant") discloses a method for obtaining retrovirus packaging cell Hnes and is hereby incorporated by reference. Propagation of retrovirus vectors, transduction of a population of ceUs (e.g., CD4+ T-ceUs), and the return of transduced ceUs to a patient are procedures weU known to persons skiUed in the art of gene therapy and have been weU documented (Ranga, U. et al. (1997) J. Virol. 71:7020-7029; Bauer, G. et al. (1997) Blood 89:2259-2267; Bonyhadi, M.L. (1997) J. Virol. 71:4707-4716; Ranga, U. et al. (1998) Proc. Natl. Acad. Sci. U.S.A. 95:1201- 1206; Su, L. (1997) Blood 89:2283-2290).
In the alternative, an adenovirus-based gene therapy deHvery system is used to deHver dithp to ceUs which have one or more genetic abnormaHties with respect to the expression of dithp. The construction and packaging of adenovirus-based vectors are weU known to those with ordinary skill in the art. RepHcation defective adenovirus vectors have proven to be versatile for importing genes encoding immunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) Transplantation 27:263-268). PotentiaUy useful adenoviral vectors are described in U.S. Patent Number 5,707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby incorporated by reference. For adenoviral vectors, see also Antinozzi, P.A. et al. (1999) Annu. Rev. Nutr. 19:511-544 and Verma, LM. and Somia, N. (1997) Nature 18:389:239-242, both incorporated by reference herein. In another alternative, a herpes-based, gene therapy deHvery system is used to deHver dithp to target ceUs which have one or more genetic abnormaHties with respect to the expression of dithp. The use of herpes simplex virus (HSV)-based vectors may be especiaUy valuable for introducing dithp to ceUs of the central nervous system, for which HSV has a tropism. The construction and packaging of herpes-based vectors are weU known to those with ordinary skiU in the art. A repHcation-competent herpes simplex virus (HSV) type 1-based vector has been used to deHver a reporter gene to the eyes of primates (Liu, X. et al. (1999) Exp. Eye Res.l69:385-395). The construction of a HSV-1 virus vector has also been disclosed in detail in U.S. Patent Number 5,804,413 to DeLuca ("Herpes simplex virus strains for gene transfer"), which is hereby incorporated by reference. U.S. Patent Number 5,804,413 teaches die use of recombinant HSV d92 which consists of a genome containing at least one exogenous gene to be transferred to a ceU under the control of the appropriate promoter for purposes including human gene therapy. Also taught by this patent are the construction and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22. For HSV vectors, see also Goins, W. F. et al. 1999 J. Virol. 73:519-532 and Xu, H. et al., (1994) Dev. Biol. 163:152-161, hereby incorporated by reference. The manipulation of cloned herpesvirus sequences, the generation of recombinant virus foHowing the transfection of multiple plasmids containing different segments of the large herpesvirus genomes, the growth and propagation of herpesvirus, and the infection of ceUs with herpesvirus are techniques weH known to those of ordinary skiU in the art.
In another alternative, an alphavirus (positive, single-stranded RNA virus) vector is used to deHver dithp to target ceUs. The biology of the prototypic alphavirus, Semliki Forest Virus (SFV), has 5 been studied extensively and gene transfer vectors have been based on the SFV genome (Garoff, H. and Li, K-J. (1998) Curr. Opin. Biotech. 9:464-469). During alphavirus RNA repHcation, a subgenomic RNA is generated that normaUy encodes the viral capsid proteins. This subgenomic RNA repHcates to higher levels than the fuU-length genomic RNA, resulting in the overproduction of capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase). 0 Similarly, inserting dithp into the alphavirus genome in place of the capsid-coding region results in the production of a large number of dithp RNAs and the synthesis of high levels of DITHP in vector transduced cells. While alphavirus infection is typicaUy associated with ceH lysis within a few days, the abiHty to estabHsh a persistent infection in hamster normal kidney ceUs (BHK-21) with a variant of Sindbis virus (SIN) indicates that the lytic repHcation of alphaviruses can be altered to suit the needs of 5 the gene therapy appHcation (Dryga, S.A. et al. (1997) Virology 228:74-83). The wide host range of alphaviruses wiU aUow the introduction of dithp into a variety of ceU types. The specific transduction of a subset of ceHs in a population may require the sorting of ceUs prior to transduction. The methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA and RNA transfections, and performing alphavirus infections, are weU known to those with ordinary skiU in the 0 art.
Antibodies
Anti-DITHP antibodies may be used to analyze protein expression levels. Such antibodies include, but are not Hmited to, polyclonal, monoclonal, chimeric, single chain, and Fab fragments. For 5 descriptions of and protocols of antibody technologies, see, e.g., Pound J.D. (1998) Immunochemical
Protocols, Humana Press, Totowa, NJ.
The amino acid sequence encoded by the dithp of the Sequence Listing may be analyzed by appropriate software (e.g., LASERGENE NAVIGATOR software, DNASTAR) to determine regions of high immunogenicity. The optimal sequences for immunization are selected from the C- o terminus, the N-terminus, and those* intervening, hydrophiUc regions of the polypeptide which are likely to be exposed to the external environment when the polypeptide is in its natural conformation.
Analysis used to select appropriate epitopes is also described by Ausubel (1997, supra, Chapter 11.7).
Peptides used for antibody induction do not need to have biological activity; however, they must be antigenic. Peptides used to induce specific antibodies may have an amino acid sequence consisting of 5 at least five amino acids, preferably at least 10 amino acids, and most preferably at least 15 amino acids. A peptide which mimics an antigenic fragment of the natural polypeptide may be fused with another protein such as keyhole limpet hemocyanin (KLH; Sigma, St. Louis MO) for antibody production. A peptide encompassing an antigenic region may be expressed from a dithp, synthesized as described above, or purified from human ceUs. 5 Procedures weU known in the art may be used for the production of antibodies. Various hosts including mice, goats, and rabbits, maybe immunized by injection with a peptide. Depending on the host species, various adjuvants maybe used to increase immunological response.
In one procedure, peptides about 15 residues in length maybe synthesized using an ABI 431 A peptide synthesizer (AppHed Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by o reaction with M-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 1995, supra). Rabbits are immunized with the peptide-KLH complex in complete Freund's adjuvant. The resulting antisera are tested for antipeptide activity by binding the peptide to plastic, blocking with 1% bovine serum albumin (BSA), reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti-rabbit IgG. Antisera with antipeptide activity are tested for anti-DITHP activity using protocols weU known in the 5 art, including ELISA, radioimmunoassay (RIA), and immunoblotting.
In another procedure, isolated and purified peptide may be used to immunize mice (about 100 μg of peptide) or rabbits (about 1 mg of peptide). Subsequently, the peptide is radioiodinated and used to screen the immunized animals' B-lymphocytes for production of antipeptide antibodies. Positive ceUs are then used to produce hybridomas using standard techniques. About 20 mg of peptide is 0 sufficient for labeling and screening several thousand clones. Hybridomas of interest are detected by screening with radioiodinated peptide to identify those fusions producing peptide-specific monoclonal antibody. In a typical protocol, weUs of a multi-weU plate (FAST, Becton-Dickinson, Palo Alto, CA) are coated with affinity-purified, specific rabbit-anti-mouse (or suitable anti-species IgG) antibodies at 10 mg/ml. The coated weUs are blocked with 1% BSA and washed and exposed to supernatants from 5 hybridomas. After incubation, the weUs are exposed to radiolabeled peptide at 1 mg/ml.
Clones producing antibodies bind a quantity of labeled peptide that is detectable above background. Such clones are expanded and subjected to 2 cycles of cloning. Cloned hybridomas are injected mto pristane-treated mice to produce ascites, and monoclonal antibody is purified from the ascitic fluid by affinity chromatography on protein A (Amersham Pharmacia Biotech). Several o procedures for the production of monoclonal antibodies, including in vitro production, are described in Pound (supra). Monoclonal antibodies with antipeptide activity are tested for anti-DiTHP activity using protocols weU known in the art, including ELISA, RIA, and immunoblotting.
Antibody fragments containing specific binding sites for an epitope may also be generated. . For example, such fragments include, but are not limited to, the F(ab')2 fragments produced by pepsin 5 digestion of the antibody molecule, and the Fab fragments generated by reducing the disulfide bridges of the F ab')2 fragments. ternat ve y, construct on o a express on ra es n i amentous bacteriophage aUows rapid and easy identification of monoclonal fragments with desired specificity (Pound, supra. Chaps. 45-47). Antibodies generated against polypeptide encoded by dithp can be used to purify and characterize fuU-length DITHP protein and its activity, binding partners, etc.
Assays Using Antibodies
Anti-DITHP antibodies maybe used in assays to quantify the amount of DITHP found in a particular human ceH. Such assays include methods utilizing the antibody and a label to detect expression level under normal or disease conditions. The peptides and antibodies of the invention may be used with or without modification or labeled by joining them, either covalently or noncovalently, with a reporter molecule.
Protocols for detecting and measuring protein expression using either polyclonal or monoclonal antibodies are weU known in the art. Examples include ELISA, RIA, and fluorescent activated ceH sorting (FACS). Such immunoassays typicaUy involve the formation of complexes between the DITHP and its specific antibody and the measurement of such complexes. These and other assays are described in Pound (supra).
Without further elaboration, it is beHeved that one skiUed in the art can, using the preceding description, utiHze the present invention to its fuUest extent. The foUowing prefened specific embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever.
The disclosures of aU patents, appHcations, and pubHcations mentioned above and below, including U.S. Ser. No. 60/261,865, U.S. Ser. No. 60/262,599, U.S. Ser. No. 60/263,102, U.S. Ser.
No. 60/262,662, U.S. Ser. No. 60/263,064, U.S. Ser. No. 60/263,330, U.S. Ser. No. 60/263,065, U.S.
Ser. No. 60/263,329, U.S. Ser. No. 60/262,207, U.S. Ser. No. 60/262,209, U.S. Ser. No. 60/262,208, U.S. Ser. No. 60/262,164, U.S. Ser. No. 60/262,215, U.S. Ser. No. 60/263,063, U.S. Ser. No.
60/261,864, U.S. Ser. No. 60/262J60, U.S. Ser. No. 60/261,622, U.S. Ser. No. 60/263,077, and U.S.
Ser. No. 60/263,069 are hereby expressly incorporated by reference.
EXAMPLES I. Construction of cDNA Libraries
RNA was purchased from CLONTECH Laboratories, Inc. (Palo Alto CA) or isolated from various tissues. Some tissues were homogenized and lysed in guanidinium isothiocyanate, while others were homogenized and lysed in phenol or in a suitable mixture of denaturants, such as TRIZOL (Life Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The resulting lysates were centrifuged over CsCl cushions or extracted with chloroform. RNA was precipitated with either isopropanol or sodium acetate and ethanol, or by other routine methods.
Phenol extraction and precipitation of RNA were repeated as necessary to increase RNA purity. In most cases, RNA was treated with DNase. For most Hbraries, poly(A-t-) RNA was isolated using oHgo d(T)-coupled paramagnetic particles (Promega Corporation (Promega), Madison WI), 5 OLIGOTEX latex particles (QIAGEN, Inc. (QIAGEN), Valencia CA), or an OLIGOTEX mRNA purification kit (QIAGEN). Alternatively, RNA was isolated directly from tissue lysates using other RNA isolation kits, e.g., the POLY(A)PURE mRNA purification kit (Ambion, Inc., Austin TX).
In some cases, Stratagene was provided with RNA and constructed the conesponding cDNA Hbraries. Otherwise, cDNA was synthesized and cDNA Hbraries were constructed with the 0 UNTZAP vector system (Stratagene Cloning Systems, Inc. (Stratagene), La JoUa CA) or
SUPERSCRIPT plasmid system (Life Technologies), using the recommended procedures or similar methods known in the art. (See, e.g., Ausubel, 1997, supra. Chapters 5.1 through 6.6.) Reverse transcription was initiated using oHgo d(T) or random primers. Synthetic oHgonucleotide adapters were Hgated to double stranded cDNA, and the cDNA was digested with the appropriate restriction 5 enzyme or enzymes. For most Hbraries, the cDNA was size-selected (300-1000 bp) using SEPHACRYL S1000, SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography (Amersham Pharmacia Biotech) or preparative agarose gel electrophoresis. cDNAs were Hgated into compatible restriction enzyme sites of the polylinker of a suitable plasmid, e.g., PBLUESCRIPT plasmid (Stratagene), PSPORT1 plasmid (Life Technologies), PCDNA2.1 plasmid (mvitrogen, o Carlsbad CA), PBK-CMV plasmid (Stratagene), PCR2-TOPOTA plasmid (Invitrogen), PCMV-ICIS plasmid (Stratagene), pIGEN (Incyte Genomics, Palo Alto CA), pRARE (Incyte Genomics), or pINCY (Incyte Genomics), or derivatives thereof. Recombinant plasmids were transformed into competent E. coH ceUs including XLl-Blue, XLl-BlueMRF, or SOLR from Stratagene or DH5α, DH10B, or ElectioMAX DH10B from Life Technologies. 5
II. Isolation of cDNA Clones
Plasmids were recovered from host ceUs by in vivo excision using the UNTZAP vector system (Stratagene) or by ceU lysis. Plasmids were purified using at least one of the foUowing: the Magic or WIZARD Minipreps DNA purification system (Promega); the AGTC Miniprep purification kit (Edge o BioSystems, Gaithersburg MD); and the QIAWELL 8, QIAWELL 8 Plus, and QIAWELL 8 Ultra plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit (QIAGEN). FoUowing precipitation, plasmids were resuspended in 0.1 ml of distiUed water and stored, with or without lyophiHzation, at 4°C
Alternatively, plasmid DNA was ampHfied from host ceU lysates using direct link PCR in a 5 high-throughput format. (Rao, V.B. (1994) Anal. Biochem. 216:1-14.) Host ceU lysis and thermal cycling steps were carried out in a single reaction mixture. Samples were processed and stored in
384-weU plates, and the concentration of ampHfied plasmid DNA was quantified fluorometricaUy using PICOGREEN dye (Molecular Probes, Inc. (Molecular Probes), Eugene OR) and a FLUOROSKAN H fluorescence scanner (Labsystems Oy, Helsinki, Finland).
III. Sequencing and Analysis cDNA sequencing reactions were processed using standard methods or high-throughput instrumentation such as the ABI CATALYST 800 thermal cycler (AppHed Biosystems) or the PTC- 200 thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins Scientific Corp., Sunnyvale CA) or the MICROLAB 2200 Hquid transfer system (Hamilton). cDNA sequencing reactions were prepared using reagents provided by Amersham Pharmacia Biotech or suppHed in ABI sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (AppHed Biosystems). Electrophoretic separation of cDNA sequencing reactions and detection of labeled polynucleotides were carried out using the MEGABACE 1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or 377 sequencing system (AppHed Biosystems) in conjunction with standard ABI protocols and base calling software; or other sequence analysis systems known in the art. Reading frames within the cDNA sequences were identified using standard methods (reviewed in Ausubel, 1997, supra, Chapter 7.7). Some of the cDNA sequences were selected for extension using the techniques disclosed in Example VEH.
IN. Assembly and Analysis of Sequences
Component sequences from chromatograms were subject to PHRED analysis and assigned a quaHty score. The sequences having at least a required quaHty score were subject to various preprocessing editing pathways to eliminate, e.g., low quaHty 3' ends, vector and linker sequences, polyA tails, Alu repeats, mitochondrial and ribosomal sequences, bacterial contamination sequences, and sequences smaUer than 50 base pairs. In particular, low-information sequences and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) were replaced by "n's", or masked, to prevent spurious matches.
Processed sequences were then subject to assembly procedures in which the sequences were assigned to gene bins (bins). Each sequence could only belong to one bin. Sequences in each gene bin were assembled to produce consensus sequences (templates). Subsequent new sequences were added to existing bms usmg BLASTn (v.1.4 WashU) and CROSSMATCH. Candidate pairs were identified as aU BLAST hits having a quaHty score greater than or equal to 150. AHgnments of at least 82% local identity were accepted into the bin. The component sequences from each bin were assembled using a version of PHRAP. Bins with several overlapping component sequences were assembled using DEEP PHRAP. The orientation (sense or antisense) of each assembled template was determined based on the number and orientation of its component sequences. Template sequences as disclosed in the sequence Hsting correspond to sense strand sequences (the "forward" reading frames), to the best determination. The complementary (antisense) strands are inherently disclosed herein. The component sequences which were used to assemble each template consensus sequence are Hsted in Table 5, along with their positions along die template nucleotide sequences.
Bins were compared against each other and those having local similarity of at least 82% were combined and reassembled. Reassembled bins having templates of insufficient overlap (less than 95% local identity) were re-spHt. Assembled templates were also subject to analysis by o STTTCHER/EXON MAPPER algorithms which analyze the probabilities of the presence of spHce variants, alternatively spHced exons, spHce junctions, differential expression of alternative spHced genes across tissue types or disease states, etc. These resulting bins were subject to several rounds of the above assembly procedures.
Once gene bins were generated based upon sequence aHgnments, bins were clone joined 5 based upon clone information. If the 5' sequence of one clone was present in one bin and the 3' sequence from the same clone was present in a different bin, it was likely that the two bins actaaUy belonged together in a single bin. The resulting combined bins underwent assembly procedures to regenerate the consensus sequences.
The final assembled templates were subsequently annotated using the foUowing procedure. 0 Template sequences were analyzed using BLASTn (v2.0, NCBI) versus gbpri (GenBank version 126). "Hits" were defined as an exact match having from 95% local identity over 200 base pairs through 100% local identity over 100 base pairs, or a homolog match having an E- value, i.e. a probabiHty score, of ≤ 1 x 10"8. The hits were subject to frameshift FASTx versus GENPEPT (GenBank version 126). (See Table 8). In this analysis, a homolog match was defined as having an 5 E- value of <1 x 10"8. The assembly method used above was described in "System and Methods for Analyzing Biomolecular Sequences," U.S.S.N. 09/276,534, filed March 25, 1999, and the LEFESEQ Gold user manual (Incyte) both incorporated by reference herein.
FoHowing assembly, template sequences were subjected to motif, BLAST, and functional analyses, and categorized in protein hierarchies using metiiods described in, e.g., "Database System o Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N.
08/812,290, filed March 6, 1997; "Relational Database for Storing Biomolecule Information," U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based FuU-Length Biomolecular Sequence Database," U.S.S.N. 08/811,758, filed March 6, 1997; and "Relational Database and System for Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, aU of which are incorporated by reference herein.
The template sequences were further analyzed by translating each template in aU three forward reading frames and searching each translation against the Pfam database of hidden Markov model-based protein famiHes and domains using the HMMER software package (available to the pubHc from Washington University School of Medicine, St. Louis MO). Regions of templates which, when translated, contain similarity to Pfam consensus sequences are reported in Table 3, along with descriptions of Pfam protein domains and famiHes. Only those Pfam hits with an E-value of ≤ 1 103 are reported. (See also World Wide Web site http://pfam.wustl.edu/ for detailed descriptions of Pfam protein domains and famiHes.) AdditionaUy, the template sequences were translated in aU three forward reading frames, and each translation was searched against hidden Markov models for signal peptides using the HMMER software package. Construction of hidden Markov models and their usage in sequence analysis has been described. (See, for example, Eddy, S.R. (1996) Curr. Opin. Str. Biol. 6:361-365.) Only those signal peptide hits with a cutoff score of 11 bits or greater are reported. A cutoff score of 11 bits or greater conesponds to at least about 91-94% true-positives in signal peptide prediction. Template sequences were also translated in aU three forward reading frames, and each translation was searched against TMHMMER, a program that uses a hidden Markov model (HMM) to delineate transmembrane segments on protein sequences and determine orientation (Sonnhammer, E.L. et al. (1998) Proc. Sixtii Intl. Conf. On Intelligent Systems for Mol. Biol., Glasgow et al., eds., The Am. Assoc. for Artificial InteUigence (AAAI) Press, Menlo Park, CA, and MTT Press, Cambridge, MA, pp. 175-182.) Regions of templates which, when translated, contain similarity to signal peptide or transmembrane consensus sequences are reported in Table 4.
The results of HMMER analysis as reported in Tables 3 and 4 may support the results of BLAST analysis as reported in Table 2 or may suggest alternative or additional properties of template- encoded polypeptides not previously uncovered by BLAST or other analyses.
Template sequences are further analyzed using the bioinformatics tools Hsted in Table 8, or using sequence analysis software known in the art such as MACDNASIS PRO software (Hitachi Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). Template sequences may be further queried against pubHc databases such as the GenBank rodent, mammaHan, vertebrate, prokaryote, and eukaryote databases.
The template sequences were translated to derive the conesponding longest open reading frame as presented by the polypeptide sequences as reported in Table 7. Alternatively, a polypeptide of the invention may begin at any of the methionine residues within the fuU length translated polypeptide. Polypeptide sequences were subsequently analyzed by querying against the GenBank protein database (GENPEPT, (GenBank version 126)). FuU length polynucleotide sequences are also analyzed usmg MACDNAS1S PRO software (Hitachi Software Engineering, South San Francisco
CA) and LASERGENE software (DNASTAR). Polynucleotide and polypeptide sequence aHgnments are generated using default parameters specified by the CLUSTAL algorithm as incorporated into the MEGALIGN multisequence aHgnment program (DNASTAR), which also calculates the percent identity between aHgned sequences.
Table 7 shows sequences with homology to die polypeptides of the invention as identified by BLAST analysis against the GenBank protein (GENPEPT) database. Column 1 shows the polypeptide sequence identification number (SEQ ED NO:) for the polypeptide segments of the invention. Column 2 shows the reading frame used in the translation of the polynucleotide sequences encoding the polypeptide segments. Column 3 shows the length of the translated polypeptide segments. Columns 4 and 5 show the start and stop nucleotide positions of the polynucleotide sequences encoding the polypeptide segments. Column 6 shows the GenBank identification number (GI Number) of the nearest GenBank homolog. Column 7 shows the probabiHty score for the match between each polypeptide and its GenBank homolog. Column 8 shows the annotation of the GenBank homolog.
V. Analysis of Polynucleotide Expression
Northern analysis is a laboratory technique used to detect the presence of a transcript of a gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs from a particular ceU type or tissue have been bound. (See, e.g., Sambrook, supra, ch. 7; Ausubel, 1995, supra, ch. 4 and 16.)
Analogous computer techniques applying BLAST were used to search for identical or related molecules in cDNA databases such as GenBank or LEFESEQ (Incyte Genomics). This analysis is much faster than multiple membrane-based hybridizations. In addition, the sensitivity of the computer search can be modified to determine whether any particular match is categorized as exact or similar. The basis of the search is the product score, which is defined as:
BLAST Score x Percent Identity
5 x minimum {length(Seq. 1), length(Seq. 2)}
The product score takes into account both the degree of similarity between two sequences and the length of the sequence match. The product score is a normaHzed value between 0 and 100, and is calculated as foHows: the BLAST score is multipHed by the percent nucleotide identity and the product is divided by (5 times the length of the shorter of the two sequences). The BLAST score is calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair (HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate the product score. The product score represents a balance between fractional overlap and quaHty in a BLAST aHgnment. For example, a product score of 100 is produced only for 100% identity over the 5 entire lengtii of the shorter of the two sequences being compared. A product score of 70 is produced either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the other. A product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% identity and 100% overlap.
o VI. Tissue Distribution Profiling
A tissue distribution profile is determined for each template by compiling the cDNA Hbrary tissue classifications of its component cDNA sequences. Each component sequence, is derived from a cDNA Hbrary constructed from a human tissue. Each human tissue is classified into one of the foHowing categories: cardiovascular system; connective tissue; digestive system; embryonic 5 structures; endocrine system; exocrrne glands; genitaHa, female; genitaHa, male; germ ceUs; hemic and immune system; Hver; musculoskeletal system; nervous system; pancreas; respiratory system; sense organs; skin; stomatognathic system; unclassified/mixed; or urinary tract. Template sequences, component sequences, and cDNA Hbrary/tissue information are found in the LEFESEQ GOLD database (Incyte Genomics, Palo Alto CA). 0 Table 6 shows the tissue distribution profile for the templates of the invention. For each template, the three most frequently observed tissue categories are shown in column 3, along with the percentage of component sequences belonging to each category. Only tissue categories with percentage values of ≥ 10% are shown. A tissue distribution of "widely distributed" in column 3 indicates percentage values of <10% in aU tissue categories. 5
VII. Transcript Image Analysis
Transcript images are generated as described in Seilhamer et al., "Comparative Gene Transcript Analysis," U.S. Patent Number 5,840,484, incorporated herein by reference.
o VIII. Extension of Polynucleotide Sequences and Isolation of a Full-length cDNA
OHgonucleotide primers designed using a dithp of the Sequence Listing are used to extend the nucleic acid sequence. One primer is synthesized to initiate 5' extension of the template, and the other primer, to initiate 3' extension of the template. The initial primers may be designed using OLIGO 4.06 software (National Biosciences, Inc. (National Biosciences), Plymouth MN), or another appropriate 5 program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C Any stretch of nucleotides which would result in hairpin structures and primer-primer dimerizations are avoided. Selected human cDNA Hbraries are used to extend the sequence. If more than one extension is necessary or desired, additional or nested sets of primers are designed. 5 High fideHty ampHfication is obtained by PCR using methods weU known in the art. PCR is performed in 96-weU plates using the PTC-200 thermal cycler (MJ Research). The reaction mix contains DNA template, 200 nmol of each primer, reaction buffer containing Mg2+, (NH^SO^ and β- mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the foUowing parameters for primer pair o PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C In the alternative, the parameters for primer pair T7 and SK+ are as foUows: Step 1: 94 °C, 3 min; Step 2: 94°C 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C 5 The concentration of DNA in each weU is determined by dispensing 100 μl PICOGREEN quantisation reagent (0.25% (v/v); Molecular Probes) dissolved in IX Tris-EDTA (TE) and 0.5 μl of undiluted PCR product into each weU of an opaque ftuorimeter plate (Corning Incorporated (Corning), Corning NY), aUowing the DNA to bind to the reagent. The plate is scanned in a FLUOROSKAN H (Labsystems Oy) to measure the fluorescence of the sample and to quantify die concentration of o DNA. A 5 μl to 10 μl aHquot of the reaction mixture is analyzed by electrophoresis on a 1 % agarose mini-gel to determine which reactions are successful in extending the sequence.
The extended nucleotides are desalted and concentrated, transferred to 384-weU plates, digested with CviH cholera virus endonuclease (Molecular Biology Research, Madison WI), and sonicated or sheared prior to reHgation into pUC 18 vector (Amersham Pharmacia Biotech). For 5 shotgun sequencing, the digested nucleotides are separated on low concentration (0.6 to 0.8%) agarose gels, fragments are excised, and agar digested with AGAR ACE (Promega). Extended clones are reHgated using T4 Hgase (New England Biolabs, Inc., Beverly MA) into pUC 18 vector (Amersham Pharmacia Biotech), treated witii Pfu DNA polymerase (Stratagene) to fiU-in restriction site overhangs, and transfected into competent E. coH ceUs. Transformed ceUs are selected on o antibiotic-containing media, individual colonies are picked and cultared overnight at 37 °C in 384-weU plates in LB/2x carbeniciUin Hquid media.
The ceUs are lysed, and DNA is ampHfied by PCR using Taq DNA polymerase (Amersham Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the foUowing parameters: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 5 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C DNA is quantified by PICOGREEN reagent (Molecular Probes) as descπbed above. Samples with low DNA recoveries are reampnfied using the same conditions as described above. Samples are diluted with 20% dimethysulfoxide (1:2, v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (AppHed Biosystems).
In like manner, the dithp is used to obtain regulatory sequences (promoters, introns, and enhancers) using the procedure above, oHgonucleotides designed for such extension, and an appropriate genomic Hbrary.
IX. Labeling of Probes and Southern Hybridization Analyses
Hybridization probes derived from the dithp of the Sequence Listing are employed for screening cDNAs, mRNAs, or genomic DNA. The labeling of probe nucleotides between 100 and 1000 nucleotides in lengdi is specificaUy described, but essentiaUy the same procedure may be used with larger cDNA fragments. Probe sequences are labeled at room temperature for 30 minutes using a T4 polynucleotide kinase, γ32P-ATP, and 0.5X One-Phor-AH Plus (Amersham Pharmacia Biotech) buffer and purified using a ProbeQuant G-50 Microcolumn (Amersham Pharmacia Biotech). The probe mixture is diluted to 107 dpm/μg/ml hybridization buffer and used in a typical membrane-based hybridization analysis.
The DNA is digested with a restriction endonuclease such as Eco RV and is electrophoresed through a 0.7% agarose gel. The DNA fragments are transferred from the agarose to nylon membrane (NYTRAN Plus, Schleicher & SchueU, Inc., Keene NH) using procedures specified by the manufacturer of the membrane. Prehybridization is carried out for three or more hours at 68 °C, and hybridization is canied out overnight at 68 °C. To remove non-specific signals, blots are sequentiaUy washed at room temperature under increasingly stringent conditions, up to O.lx saline sodium citrate (SSC) and 0.5% sodium dodecyl sulfate. After the blots are placed in a PHOSPHORTMAGER cassette (Molecular Dynamics) or are exposed to autoradiography film, hybridization patterns of standard and experimental lanes are compared. EssentiaUy the same procedure is employed when screening RNA.
X. Chromosome Mapping of dithp
The cDNA sequences which were used to assemble SEQ TD NO: 1-56 are compared with sequences from the Incyte LEFESEQ database and pubHc domain databases using BLAST and other implementations of the Smith-Waterman algorithm. Sequences from these databases that match SEQ TD NO: 1-56 are assembled into clusters of contiguous and overlapping sequences using assembly algorithms such as PHRAP (Table 8). Radiation hybrid and genetic mapping data available from pubHc resources such as the Stanford Human Cienome Center (SHUC), Whitehead institute for
Genome Research (WIGR), and Genethon are used to determine if any of the clustered sequences have been previously mapped. Inclusion of a mapped sequence in a cluster wiU result in the assignment of aU sequences of that cluster, including its particular SEQ ID NO:, to that map location. 5 The genetic map locations of SEQ TD NO:l-56 are described as ranges, or intervals, of human chromosomes. The map position of an interval, in centiMorgans, is measured relative to the terminus of the chromosome's p-arm. (The centiMorgan (cM) is a unit of measurement based on recombination frequencies between chromosomal markers. On average, 1 cM is roughly equivalent to 1 megabase (Mb) of DNA in humans, although this can vary widely due to hot and cold spots of 0 recombination.) The cM distances are based on genetic markers mapped by Genethon which provide boundaries for radiation hybrid markers whose sequences were included in each of the clusters.
XL Microarray Analysis
Probe Preparation from Tissue or CeU Samples 5 Total RNA is isolated from tissue samples using the guanidinium thiocyanate method and polyA+ RNA is purified using the oHgo (dT) ceHulose method. Each polyA+ RNA sample is reverse transcribed using MMLV reverse-transcriptase, 0.05 pg/μl oHgo-dT primer (21mer), IX first strand buffer, 0.03 units/μl RNase inhibitor, 500 μM dATP, 500 μM dGTP, 500 μM dTTP, 40 μM dCTP, 40 μM dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech). The reverse transcription 0 reaction is performed in a 25 ml volume containing 200 ng polyA+ RNA with GEMBRIGHT kits (Incyte). Specific control polyA+ RNAs are synthesized by in vitro transcription from non-coding yeast genomic DNA (W. Lei, unpubHshed). As quantitative controls, the control mRNAs at 0.002 ng, 0.02 ng, 0.2 ng, and 2 ng are diluted into reverse transcription reaction at ratios of 1:100,000, 1:10,000, 1:1000, 1:100 (w/w) to sample mRNA respectively. The control mRNAs are diluted into reverse 5 transcription reaction at ratios of 1:3, 3:1, 1:10, 10:1, 1:25, 25:1 (w/w) to sample mRNA differential expression patterns. After incubation at 37° C for 2 hr, each reaction sample (one with Cy3 and another with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and incubated for 20 minutes at 85° C to the stop the reaction and degrade the RNA. Probes are purified using two successive CHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc. o (CLONTECH), Palo Alto CA) and after combining, both reaction samples are ethanol precipitated using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol. The probe is then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and resuspended in 14 μl 5X SSC/0.2% SDS.
5 Microaπay Preparation equences o t e presen mven on are use o genera e anay e emen s. ac anay e emen is ampHfied from bacterial ceHs containing vectors with cloned cDNA inserts. PCR ampHfication uses primers complementary to the vector sequences flanking the cDNA insert. Anay elements are ampHfied in thirty cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 μg. AmpHfied array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia Biotech). Purified array elements are immobiHzed on polymer-coated glass sHdes. Glass microscope sHdes (Corning) are cleaned by ultrasound in 0.1 % SDS and acetone, with extensive distiHed water washes between and after treatments. Glass sHdes are etched in 4% hydrofluoric acid (VWR Scientific Products Corporation (VWR), West Chester, PA), washed extensively in distiUed water, and coated with 0.05% aminopropyl silane (Sigma) in 95% ethanol. Coated sHdes are cured in a 110°C oven.
Anay elements are appHed to the coated glass substrate using a procedure described in US Patent No. 5,807,522, incorporated herein by reference. 1 μl of the array element DNA, at an average concentration of 100 ng/μl, is loaded into die open capiUary printing element by a high-speed robotic apparatus. The apparatus then deposits about 5 nl of array element sample per sHde.
Microanays are UV-crosslinked using a STRATALINKER UV-crossHnker (Stratagene). Microarrays are washed at room temperature once in 0.2% SDS and three times in distiUed water. Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate buffered saline (PBS) (Tropix, Inc., Bedford, MA) for 30 minutes at 60° C foUowedby washes in 0.2% SDS and distiUed water as before.
Hybridization
Hybridization reactions contain 9 μl of probe mixture consisting of 0.2 μg each of Cy3 and Cy5 labeled cDNA synthesis products in 5X SSC, 0.2% SDS hybridization buffer. The probe mixture is heated to 65° C for 5 minutes and is aHquoted onto the microanay surface and covered with an 1.8 cm2 coversHp. The anays are transferred to a waterproof chamber having a cavity just sHghtly larger than a microscope sHde. The chamber is kept at 100% humidity internaUy by the addition of 140 μl of 5x SSC in a comer of the chamber. The chamber containing the arrays is incubated for about 6.5 hours at 60°C The anays are washed for 10 min at 45°C in a first wash buffer (IX SSC, 0.1% SDS), three times for 10 minutes each at 45° C in a second wash buffer (0. IX SSC), and dried.
Detection
Reporter-labeled hybridization complexes are detected with a microscope equipped with an Innova 70 mixed gas 10 W laser (Coherent, Inc., Santa Clara CA) capable of generating spectral lines at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5. The excitation laser Hght is focused on the anay using a 20X microscope objective (Nikon, Inc., MelviUe NY). The sHde containing the anay is placed on a computer-controUed X-Y stage on the microscope and raster- scanned past the objective. The 1.8 cm x 1.8 cm array used in the present example is scanned with a resolution of 20 micrometers. In two separate scans, a mixed gas multiline laser excites the two fluorophores sequentiaUy.
Emitted Hght is spHt, based on wavelength, into two photomultipHer tube detectors (PMT R1477, Hamamatsu Photonics Systems, Bridgewater NJ) conesponding to the two fluorophores. Appropriate filters positioned between the array and the photomultipHer tabes are used to filter the signals. The emission maxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5. Each anay is typicaUy scanned twice, one scan per fluorophore using the appropriate filters at the laser source, although the apparatus is capable of recording the spectra from both fluorophores simultaneously. The sensitivity of the scans is typicaUy caHbrated using the signal intensity generated by a cDNA control species added to the probe mix at a known concentration. A specific location on the array contains a complementary DNA sequence, aUowing the intensity of die signal at that location to be conelated with a weight ratio of hybridizing species of 1:100,000. When two probes from different sources (e.g., representing test and control ceHs), each labeled witii a different fluorophore, are hybridized to a single anay for the purpose of identifying genes that are differentiaUy expressed, the caHbration is done by labeling samples of the caHbrating cDNA with the two fluorophores and adding identical amounts of each to the hybridization mixture. The output of the photomultipHer tube is digitized using a 12-bit RTΪ-835H analog-to-digital
(A/D) conversion board (Analog Devices, Inc., Norwood, MA) instaUed in an IBM-compatible PC computer. The digitized data are displayed as an image where the signal intensity is mapped using a linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high signal). The data is also analyzed quantitatively. Where two different fluorophores are excited and measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping emission spectra) between the fluorophores using each fluorophore's emission spectrum.
A grid is superimposed over the fluorescence signal image such that the signal from each spot is centered in each element of the grid. The fluorescence signal within each element is then integrated to obtain a numerical value conesponding to the average intensity of the signal. The software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte).
XII. Complementary Nucleic Acids
Sequences complementary to the dithp are used to detect, decrease, or inhibit expression of the nataraUy occurring nucleotide. The use of oHgonucleotides comprising from about 15 to 30 base pairs is typical in the art. However, smaUer or larger sequence fragments can also be used. Appropriate oHgonucleotides are designed from the dithp using OL1GO 4.06 software (National
Biosciences) or other appropriate programs and are synthesized using methods standard in the art or ordered from a commercial suppHer. To inhibit transcription, a complementary oHgonucleotide is designed from the most unique 5' sequence and used to prevent transcription factor binding to the promoter sequence. To inhibit translation, a complementary oHgonucleotide is designed to prevent ribosomal binding and processing of the transcript.
XIII. Expression of DITHP
Expression and purification of DITHP is accompHshed using bacterial or virus-based o expression systems. For expression of DITHP in bacteria, cDNA is subcloned into an appropriate vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of cDNA transcription. Examples of such promoters include, but are not Hmited to, the trp-lac (tac) hybrid promoter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator regulatory element. Recombinant vectors are transformed into suitable bacterial hosts, e.g., 5 BL21(DE3). Antibiotic resistant bacteria express DITHP upon induction with isopropyl beta-D- thiogalactopyranoside (EPTG). Expression of DITHP in eukaryotic ceHs is achieved by infecting insect or mammaHan ceH Hnes with recombinant Autographica caHfornica nuclear polyhedrosis virus (AcMNPV), commonly known as baculovirus. The nonessential polyhedrin gene of baculovirus is replaced with cDNA encoding DITHP by either homologous recombination or bacterial-mediated o transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong polyhedrin promoter drives high levels of cDNA transcription. Recombinant baculovirus is used to infect Spodoptera frugiperda (Sf9) insect ceHs in most cases, or human hepatocytes, in some cases. Infection of the latter requires additional genetic modifications to baculovirus. (See e.g., Engelhard, supra; and Sandig, supra.) 5 In most expression systems, DITHP is synthesized as a fusion protein with, e.g., glutathione
S-transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, single-step, affinity-based purification of recombinant fusion protein from crude ceH lysates. GST, a 26-kilodalton enzyme from Schistosoma japonicum, enables the purification of fusion proteins on immobilized glutathione under conditions that maintain protein activity and antigenicity (Amersham Pharmacia o Biotech). FoUowing purification, the GST moiety can be proteolyticaUy cleaved from DLTHP at specificaUy engineered sites. FLAG, an 8-amino acid peptide, enables immunoaffinity purification using commerciaUy available monoclonal and polyclonal anti-FLAG antibodies (Eastman Kodak Company, Rochester NY). 6-His, a stretch of six consecutive histidine residues, enables purification on metal-chelate resins (QIAGEN). Methods for protein expression and purification are discussed in Ausubel (1995, supra, Chapters 10 and 16). Punhed DITHP obtained by these methods can be used directly in the foUowing activity assay.
XIV. Demonstration of DITHP Activity DITHP activity is demonstrated through a variety of specific assays, some of which are outlined below.
Oxidoreductase activity of DITHP is measured by the increase in extinction coefficient of NAD(P)H coenzyme at 340 nmfor the measurement of oxidation activity, or the decrease in extinction coefficient of NAD(P)H coenzyme at 340 nmfor the measurement of reduction activity (Dalziel, K. (1963) J. Biol. Chem. 238:2850-2858). One of three substrates may be used: Asn-βGal, biocytidine, or ubiquinone- 10. The respective subunits of the enzyme reaction, for example, cytochtome cx-b oxidoreductase and cytochrome c, are reconstituted. The reaction mixture contains a) 1-2 mg/ml DITHP; and b) 15 mM substrate, 2.4 mM NAD(P)+ in 0.1 M phosphate buffer, pH 7.1 (oxidation reaction), or 2.0 mM NAD(P)H, in 0.1 M Na2HP04 buffer, pH 7.4 ( reduction reaction); in a total volume of 0.1 ml. Changes in absorbance at 340 nm (A340) are measured at 23.5 ° C using a recording spectrophotometer (Shimadzu Scientific Instruments, Inc., Pleasanton CA). The amount of NAD(P)H is stoichiometiicaUy equivalent to the amount of substrate initiaUy present, and the change in A340 is a direct measure of the amount of NAD(P)H produced; ΔA340 = 6620[NADH]. Oxidoreductase activity of DITHP activity is proportional to the amount of NAD(P)H present in the assay.
Transferase activity of DITHP is measured through assays such as a methyl tiansferase assay in which the transfer of radiolabeled methyl groups between a donor substrate and an acceptor substrate is measured (Bokar, J.A. et al. (1994) J. Biol. Chem. 269:17697-17704). Reaction mixtures (50 μl final volume) contain 15 mM HEPES, pH 7.9, 1.5 mM MgCl^, 10 mM dithiothreitol, 3% polyvinylalcohol, 1.5 μCi [ e /-3H]AdoMet (0.375 μM AdoMet) (DuPont-NEN), 0.6 μg DITHP, and acceptor substrate (0.4 μg [35S]RNA or 6-mercaptopurine (6-MP) to 1 mM final concentration). Reaction mixtures are incubated at 30 °C for 30 minutes, then 65 °C for 5 minutes. The products are separated by chromatography or electrophoresis and the level of methyl transferase activity is determined by quantification of methyl-3H recovery. DITHP hydrolase activity is measured by the hydrolysis of appropriate synthetic peptide substrates conjugated with various chromogenic molecules in which the degree of hydrolysis is quantified by spectrophotometric (or fluorometric) absorption of the released chromophore. (Beynon, RJ. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach, Oxford University Press, New York NY, pp. 25-55) Peptide substrates are designed according to the category of protease activity as endopeptidase (serine, cysteine, aspartic proteases), animopeptidase (leucine aminopeptidase), or carboxypeptidase (Carboxypeptidase A and a, proconagen -protemasej.
DITHP isomerase activity such as peptidyl prolyl cis/trans isomerase activity can be assayed by an enzyme assay described by Rahfeld, J.U., et al. (1994) (FEBS Lett. 352: 180-184). The assay is performed at 10°C in 35 mM HEPES buffer, pH 7.8, containing chymotrypsin (0.5 mg/ml) and 5 DITHP at a variety of concentrations. Under these assay conditions, the substrate, Suc-Ala-Xaa-Pro- Phe-4-NA, is in equiHbrium with respect to the prolyl bond, with 80-95% in trans and 5-20% in cis conformation. An aHquot (2 ul) of the substrate dissolved in dimethyl sulfoxide (10 mg/ml) is added to the reaction mixture described above. Only the cis isomer of the substrate is a substrate for cleavage by chymotrypsin. Thus, as the substrate is isomerized by DITHP, the product is cleaved by o chymotrypsin to produce 4-nitroaniHde, which is detected by it's absorbance at 390 nm. 4-NitroaniHde appears in a time-dependent and a DITHP concentration-dependent manner.
An assay for DITHP activity associated with growth and development measures ceU proHferation as the amount of newly initiated DNA synthesis in Swiss mouse 3T3 ceUs. A plasmid containing polynucleotides encoding DITHP is transfected into quiescent 3T3 cultured ceUs using 5 methods weU known in the art. The transiently transfected ceUs are then incubated in the presence of [3H]tiιymidine, a radioactive DNA precursor. Where appHcable, varying amounts of DITHP Hgand are added to the transfected ceHs. Incorporation of [3H]thymidine into acid-precipitable DNA is measured over an appropriate time interval, and the amount incorporated is directly proportional to the amount of newly synthesized DNA. o Growth factor activity of DITHP is measured by the stimulation of DNA synthesis in Swiss mouse 3T3 ceUs (McKay, I. and I. Leigh, eds. (1993) Growth Factors: A Practical Approach, Oxford University Press, New York NY). Initiation of DNA synthesis indicates the ceUs' entry into the mitotic cycle and their commitment to undergo later division. 3T3 ceUs are competent to respond to most growth factors, not only those that are mitogenic, but also those that are involved in embryonic 5 induction. This competence is possible because the in vivo specificity demonstrated by some growth factors is not necessarily inherent but is determined by the responding tissue. In this assay, varying amounts of DITHP are added to quiescent 3T3 cultared ceUs in the presence of [3H]thymidine, a radioactive DNA precursor. DITHP for this assay can be obtained by recombinant means or from biochemical preparations. Incorporation of [3HJthymidine into acid-precipitable DNA is measured o over an appropriate time interval, and the amount incorporated is directly proportional to the amount of newly synthesized DNA. A linear dose-response curve over at least a hundred-fold DITHP concentration range is indicative of growth factor activity. One unit of activity per miUiHter is defined as the concentration of DITHP producing a 50% response level, where 100% represents maximal incorporation of [3H]thymidine into acid-precipitable DNA. 5 Alternatively, an assay for cytokine activity of DITHP measures the proHferation of leukocytes, in this assay, the amount of tiitiated thymidine incorporated mto newly synthesized DMA is used to estimate proHferative activity. Varying amounts of DTTHP are added to cultared leukocytes, such as granulocytes, monocytes, or lymphocytes, in the presence of [3H]thymidine, a radioactive DNA precursor. DITHP for this assay can be obtained by recombinant means or from biochemical preparations. Incorporation of [3H]1hymidine into acid-precipitable DNA is measured over an appropriate time interval, and the amount incorporated is directly proportional to the amount of newly synthesized DNA. A linear dose-response curve over at least a hundred-fold DITHP concentration range is indicative of DITHP activity. One unit of activity per milHHter is conventionaUy defined as the concentration of DITHP producing a 50% response level, where 100% represents maximal incorporation of [3H]fhymidine into acid-precipitable DNA.
An alternative assay for DITHP cytokine activity utilizes a Boyden micro chamber (Neuroprobe, Cabin John MD) to measure leukocyte chemotaxis (Vicari, supra). In this assay, about 105 migratory ceUs such as macrophages or monocytes are placed in ceU culture media in the upper compartment of the chamber. Varying dilutions of DITHP are placed in the lower compartment. The two compartments are separated by a 5 or 8 micron pore polycarbonate filter (Nucleopore, Pleasanton CA). After incubation at 37 °C for 80 to 120 minutes, the filters are fixed in methanol and stained with appropriate labeling agents. CeUs which migrate to the other side of the filter are counted using standard microscopy. The chemotactic index is calculated by dividing the number of migratory ceUs counted when DITHP is present in the lower compartment by the number of migratory ceHs counted when only media is present in the lower compartment. The chemotactic index is proportional to the activity of DITHP.
Alternatively, ceU Hnes or tissues transformed with a vector containing dithp can be assayed for DITHP activity by immunoblotting. CeHs are denatured in SDS in the presence of β- mercaptoethanol, nucleic acids removed by ethanol precipitation, and proteins purified by acetone precipitation. PeUets are resuspended in 20 mM ttis buffer at pH 7.5 and incubated with Protein G- Sepharose pre-coated with an antibody specific for DITHP. After washing, the Sepharose beads are boiled in electrophoresis sample buffer, and the eluted proteins subjected to SDS-PAGE. The SDS- PAGE is tiansfeπed to a nitroceUulose membrane for immunoblotting, and the DTTHP activity is assessed by visuaHzing and quantifying bands on the blot using the antibody specific for DTTHP as the primary antibody and 125I-labeled IgG specific for the primary antibody as the secondary antibody. DTTHP kinase activity is measured by phosphorylation of a protein substrate using γ-labeled [32P]-ATP and quantisation of the incorporated radioactivity using a radioisotope counter. DTTHP is incubated with the protein substrate, [32P]-ATP, and an appropriate kinase buffer. The [32P] incorporated into the product is separated from free [32P]-ATP by electrophoresis and the incorporated ["P] is counted. The amount of [ά'ψ] recovered is proportional to the kinase activity of
DTTHP in the assay. A determination of the specific amino acid residue phosphorylated is made by phosphoamino acid analysis of the hydrolyzed protein.
In die alternative, DITHP activity is measured by the increase in ceU proHferation resulting 5 from transformation of a mammaHan ceU Hne such as COS7, HeLa or CHO with an eukaryotic expression vector encoding DITHP. Eukaryotic expression vectors are commerciaUy available, and the techniques to introduce them into ceUs are weU known to those skiUed in the art. The ceUs are incubated for 48-72 hours after transformation under conditions appropriate for the ceH Hne to aUow expression of DTTHP. Phase microscopy is then used to compare the mitotic index of transformed o versus control ceUs. An increase in the mitotic index indicates DTTHP activity.
In a further alternative, an assay for DTTHP signaling activity is based upon the abiHty of GPCR family proteins to modulate G protein-activated second messenger signal transduction pathways (e.g., cAMP; Gaudin, P. et al. (1998) J. Biol. Chem. 273:4990-4996). A plasmid encoding fuH length DTTHP is transfected into a mammaHan ceU Hne (e.g., Chinese hamster ovary (CHO) or 5 human embryonic kidney (HEK-293) ceU Hnes) using methods weU-known in the art. Transfected ceUs are grown in 12-weU trays in culture medium for 48 hours, then the culture medium is discarded, and the attached ceUs are gently washed with PBS. The ceUs are then incubated in culture medium with or without Hgand for 30 minutes, then the medium is removed and ceUs lysed by treatment with 1 M perchloric acid. The cAMP levels in the lysate are measured by radioimmunoassay using methods o weU-known in the art. Changes in the levels of cAMP in the lysate from ceHs exposed to Hgand compared to those without Hgand are proportional to the amount of DTTHP present in the transfected ceHs.
Alternatively, an assay for DTTHP protein phosphatase activity measures the hydrolysis of P- nitiophenyl phosphate (PNPP). DTTHP is incubated together with PNPP in HEPES buffer pH 7.5, in 5 the presence of 0.1 % β-mercaptoethanol at 37 °C for 60 min. The reaction is stopped by the addition of 6 ml of 10 N NaOH, and the increase in Hght absorbance of the reaction mixture at 410 nm resulting from the hydrolysis of PNPP is measured using a spectrophotometer. The increase in Hght absorbance is proportional to the phosphatase activity of DITHP in the assay (Diamond, R.H. et al (1994) Mol CeU Biol 14:3752-3762). o An alternative assay measures DTTHP-mediated G-protein signaling activity by monitoring the mobilization of Ca++ as an indicator of the signal transduction pathway stimulation. (See, e.g., Grynkievicz, G. et al. (1985) J. Biol. Chem. 260:3440; McCoU, S. et al. (1993) J. Immunol. 150:4550-4555; and Aussel, C et al. (1988) J. Immunol. 140:215-220). The assay requires preloading neutrophils or T ceUs with a fluorescent dye such as FURA-2 or BCECF (Universal Imaging Corp, 5 Westchester PA) whose emission characteristics are altered by Ca"1-1' binding. When the ceUs are exposed to one or more activating stimuli artificiaUy (e.g., anti-CD3 antibody Hgation of the T ceU receptor) or physiologicaUy (e.g., by aUogeneic stimulation), Ca"1""1" flux takes place. This flux can be observed and quantified by assaying the ceUs in a fluorometer or fluorescent activated ceH sorter. Measurements of Ca++ flux are compared between ceUs in their normal state and those transfected with DITHP. Increased Ca++ mobilization attributable to increased DITHP concentration is proportional to DITHP activity.
DTTHP transport activity is assayed by measuring uptake of labeled substrates into Xenopus laevis oocytes. Oocytes at stages V and VI are injected with DTTHP mRNA (10 ng per oocyte) and incubated for 3 days at 18°C in OR2 medium (82.5mM NaCl, 2.5 mM KC1, ImM CaCl2, ImM MgC^, ImM N JTP04, 5 mM Hepes, 3.8 mM NaOH, 50μg/ml gentamycin, pH 7.8) to aUow expression of DTTHP protein. Oocytes are then transfeπed to standard uptake medium (lOOmM NaCl, 2 mM KC1, ImM CaClj, ImM MgCl2, 10 mM Hepes/Tris pH 7.5). Uptake of various substrates (e.g., amino acids, sugars, drugs, ions, and neurotransmitters) is initiated by adding labeled substrate (e.g. radiolabeled with 3H, fluorescently labeled with rhodamine, etc.) to the oocytes. After incubating for 30 minutes, uptake is terminated by washing the oocytes three times in NaMfree medium, measuring the incorporated label, and comparing with controls. DTTHP transport activity is proportional to the level of internaHzed labeled substrate.
DTTHP transferase activity is demonstrated by a test for galactosyltiansferase activity. This can be determined by measuring the transfer of radiolabeled galactose from UDP-galactose to a GlcNAc-terminated oHgosaccharide chain (Kolbinger, F. et al. (1998) J. Biol. Chem. 273:58-65). The sample is incubated with 14 μl of assay stock solution (180 mM sodium cacodylate, pH 6.5, 1 mg/ml bovine serum albumin, 0.26 mM UDP-galactose, 2 μl of UDP-pHJgalactose), 1 μl of MnCl2 (500 mM), and 2.5 μl of GlcNAcβO-(CH2)8-C02Me (37 mg/ml in dimethyl sulfoxide) for 60 minutes at 37 °C. The reaction is quenched by the addition of 1 ml of water and loaded on a CI 8 Sep-Pak cartridge (Waters), and the column is washed twice with 5 ml of water to remove unreacted UDP- pHJgalactose. The [3H]galactosylated GlcNAcβO-(CH2)8-CO2Me remains bound to the column during the water washes and is eluted with 5 ml of methanol. Radioactivity in the eluted material is measured by Hquid scintillation counting and is proportional to galactosyltiansferase activity in the starting sample. In the alternative, DTTHP induction by heat or toxins may be demonstrated using primary cultures of human fibroblasts or human ceU Hnes such as CCL-13, HEK293, or HEP G2 (ATCC). To heat induce DTTHP expression, aHquots of ceHs are incubated at 42 °C for 15, 30, or 60 minutes. Contiol aHquots are incubated at 37 °C for the same time periods. To induce DTTHP expression by toxins, aHquots of ceUs are treated with 100 μM arsenite or 20 mM azetidine-2-carboxyHc acid for 0, 3, 6, or 12 hours. After exposure to heat, arsenite, or the amino acid analogue, samples of the treated ce s are arveste an ce ysates prepare or ana ys s y western ot. e s are yse n ys s buffer containing 1% Nonidet P-40, 0.15 M NaCl, 50 mM Tris-HCl, 5 mM EDTA, 2 mM N-ethylmaleimide, 2 mM phenylmethylsulfonyl fluoride, 1 mg/ml leupeptin, and 1 mg/ml pepstatin. Twenty micrograms of the ceU lysate is separated on an 8% SDS-PAGE gel and transferred to a membrane. After blocking with 5% nonfat dry milk/phosphate-buffered saline for 1 h, the membrane is incubated overnight at 4°C or at room temperature for 2-4 hours with a 1:1000 dilution of anti-DTTHP serum in 2% nonfat dry milk/phosphate-buffered saline. The membrane is then washed and incubated with a 1:1000 dilution of horseradish peroxidase-conjugated goat anti-rabbit IgG in 2% dry milk/phosphate-buffered saline. After washing with 0.1% Tween 20 in phosphate-buffered saline, the DITHP protein is detected and compared to controls using chemiluminescence.
Alternatively, DTTHP protease activity is measured by the hydrolysis of appropriate synthetic peptide substrates conjugated with various chromogenic molecules in which the degree of hydrolysis is quantified by spectrophotometric (or fluorometric) absorption of the released chromophore (Beynon, RJ. and J.S. Bond (1994) Proteolytic Enzymes: A Practical Approach, Oxford University Press, New York, NY, pp.25-55). Peptide substrates are designed according to the category of protease activity as endopeptidase (serine, cysteine, aspartic proteases, or metaUoproteases), aminopeptidase (leucine aminopeptidase), or carboxypeptidase (carboxypeptidases A and B, procoUagen C-proteinase). Commonly used chromogens are 2-naphthylamine, 4-nitroaniHhe, and furylacryHc acid. Assays are performed at ambient temperature and contain an aHquot of the enzyme and the appropriate substrate in a suitable buffer. Reactions are carried out in an optical cuvette, and the increase/decrease in absorbance of the chromogen released during hydrolysis of the peptide substrate is measured. The change in absorbance is proportional to the DTTHP protease activity in the assay.
In the alternative, an assay for DITHP protease activity takes advantage of fluorescence resonance energy transfer (FRET) that occurs when one donor and one acceptor fluorophore with an appropriate spectral overlap are in close proximity. A flexible peptide linker containing a cleavage site specific for PRTS is fused between a red-shifted variant (RSGFP4) and a blue variant (BFP5) of Green Fluorescent Protein. This fusion protein has spectral properties that suggest energy transfer is occurring from BFP5 to RSGFP4. When the fusion protein is incubated with DTTHP, the substrate is cleaved, and the two fluorescent proteins dissociate. This is accompanied by a marked decrease in energy transfer which is quantified by comparing the emission spectra before and after the addition of DTTHP (Mitra, RD. et al (1996) Gene 173:13-17). This assay can also be performed in Hving ceUs. In this case the fluorescent substrate protein is expressed constitatively in ceUs and DTTHP is introduced on an inducible vector so that FRET can be monitored in the presence and absence of DTTHP (Sagot, I. et al (1999) FEBS Lett. 447:53-57). A method to determine the nucleic acid binding activity of DTTHP involves a polyacrylamide gel mobiHty-shift assay. In preparation for this assay, DITHP is expressed by transforming a mammaHan ceU Hne such as COS7, HeLa or CHO with a eukaryotic expression vector containing DTTHP cDNA. The ceUs are incubated for 48-72 hours after transformation under conditions appropriate for the ceUline to aUow expression and accumulation of DTTHP. Extracts containing 5 solubiHzed proteins can be prepared from ceUs expressing DTTHP by methods weH known in the art. Portions of the extract containing DITHP are added to [32P]-labeled RNA or DNA. Radioactive nucleic acid can be synthesized in vitro by techniques weH known in the art. The mixtares are incubated at 25 °C in the presence of RNase- and DNase-inhibitors under buffered conditions for 5-10 minutes. After incubation, the samples are analyzed by polyacrylamide gel electrophoresis foUowed o by autoradiography. The presence of a band on the autoradiogram indicates the formation of a complex between DITHP and the radioactive transcript. A band of similar mobility wiU not be present in samples prepared using control extracts prepared from untransformed ceUs.
In the alternative, a method to determine the methylase activity of a DTTHP measures transfer of radiolabeled methyl groups between a donor substrate and an acceptor substrate. Reaction 5 mixtures (50 μl final volume) contain 15 mM HEPES, pH 7.9, 1.5 mM MgCl2, 10 mM dithiothreitol, 3% polyvinylalcohol, 1.5 μCi [met/ιyZ-3H]AdoMet (0.375 μM AdoMet) (DuPont-NEN), 0.6 μg DTTHP, and acceptor substrate (e.g., 0.4 μg [35S]RNA, or 6-mercaptopurine (6-MP) to 1 mM final concentration). Reaction mixtares are incubated at 30 °C for 30 minutes, then 65 °C for 5 minutes. Analysis of [ et/ryZ-3H]RNA is as foUows: 1) 50 μl of 2 x loading buffer (20 mM Tris-HCl, pH 7.6, 1 0 M LiCl, 1 mM EDTA, 1% sodium dodecyl sulphate (SDS)) and 50 μl oHgo d(T)-ceUulose (10 mg/ml in 1 x loading buffer) are added to the reaction mixture, and incubated at ambient temperature with shaking for 30 minutes. 2) Reaction mixtures are transfened to a 96-weU filtration plate attached to a vacuum apparatus. 3) Each sample is washed sequentiaUy with three 2.4 ml aHquots of 1 x oHgo d(T) loading buffer containing 0.5% SDS, 0.1% SDS, or no SDS. and 4) RNA is eluted with 300 μl of 5 water into a 96-weU coUection plate, transferred to scintillation vials containing Hquid scintiUant, and radioactivity determined. Analysis of [methyl-3Η_6-MP is as foUows: 1) 500 μl 0.5 M borate buffer, pH 10.0, and then 2.5 ml of 20% (v/v) isoamyl alcohol in toluene are added to the reaction mixtares. 2) The samples mixed by vigorous vortexing for ten seconds. 3) After centrifugation at 700g for 10 minutes, 1.5 ml of the organic phase is transfened to scintiUation vials containing 0.5 ml absolute o ethanol and Hquid scintiUant, and radioactivity determined, and 4) Results are conected for the extraction of 6-MP into the organic phase (approximately 41%).
An assay for adhesion activity of DITHP measures the disruption of cytoskeletal filament networks upon overexpression of DTTHP in cultured ceU Hnes (Rezniczek, G.A. et al. (1998) J. CeU Biol. 141:209-225). cDNA encoding DTTHP is subcloned into a mammaHan expression vector that 5 drives high levels of cDNA expression. This construct is transfected into cultured ceHs, such as rat .kangaroo J tK or rat bladder carcinoma 04Ci ceUs. Actin filaments and intermediate filaments such as keratin and vimentin are visuaHzed by immunofluorescence microscopy using antibodies and techniques weU known in the art. The configuration and abundance of cytoskeletal filaments can be assessed and quantified using confocal imaging techniques. In particular, the bundling and coUapse of 5 cytoskeletal filament networks is indicative of DTTHP adhesion activity.
Alternatively, an assay for DITHP activity measures the expression of DITHP on the ceU surface. cDNA encoding DTTHP is transfected into a non-leukocytic ceU Hne. CeH surface proteins are labeled with biotin (de la Fuente, M.A. et al. (1997) Blood 90:2398-2405). Immunoprecipitations are performed using DTTHP-specific antibodies, and immunoprecipitated samples are analyzed using 0 SDS-PAGE and immunoblotting techniques. The ratio of labeled immunoprecipitant to unlabeled immunoprecipitant is proportional to the amount of DITHP expressed on the ceH surface.
Alternatively, an assay for DITHP activity measures the amount of ceU aggregation induced by overexpression of DITHP. In this assay, cultared ceUs such as NTH3T3 are transfected with cDNA encoding DITHP contained within a suitable mammaHan expression vector under control of a 5 strong promoter. Cotransfection with cDNA encoding a fluorescent marker protein, such as Green Fluorescent Protein (CLONTECH), is useful for identifying stable transfectants. The amount of ceU agglutination, or clumping, associated with transfected ceUs is compared with that associated with untransfected ceUs. The amount of ceU agglutination is a direct measure of DTTHP activity.
DITHP may recognize and precipitate antigen from serum. This activity can be measured by 0 the quantitative precipitin reaction (Golub, E.S. et al. (1987) Immunology: A Synthesis, Sinauer
Associates, Sunderland MA, pages 113-115). DTTHP is isotopicaUy labeled using methods known in the art. Various serum concentrations are added to constant amounts of labeled DTTHP. DITHP- antigen complexes precipitate out of solution and are coUected by centrifugation. The amount of precipitable DTTHP-antigen complex is proportional to the amount of radioisotope detected in the 5 precipitate. The amount of precipitable DTTHP-antigen complex is plotted against the serum concentration. For various serum concentrations, a characteristic precipitation curve is obtained, in which the amount of precipitable DTTHP-antigen complex initiaUy increases proportionately with increasing serum concentration, peaks at the equivalence point, and then decreases proportionately with further increases in serum concentration. Thus, the amount of precipitable DITHP-antigen o complex is a measure of DTTHP activity which is characterized by sensitivity to both limiting and excess quantities of antigen.
A microtubule motiHty assay for DTTHP measures motor protein activity. In this assay, recombinant DTTHP is immobilized onto a glass sHde or similar substrate. Taxol-stabiHzed bovine brain microtubules (commerciaUy available) in a solution containing ATP and cytosoHc extract are 5 perfused onto the sHde. Movement of microtubules as driven by DTTHP motor activity can be visuaHzed and quan e us ng v eo-en ance g t m croscopy an mage ana ys s tec n ques.
DTTHP motor protein activity is directly proportional to the frequency and velocity of microtubule movement.
Alternatively, an assay for DITHP measures the formation of protein filaments in vitro. A 5 solution of DITHP at a concentration greater than the "critical concentration" for polymer assembly is appHed to carbon-coated grids. Appropriate nucleation sites maybe suppHed in the solution. The grids are negative stained with 0.7% (w/v) aqueous uranyl acetate and examined by electron microscopy. The appearance of filaments of approximately 25 nm (microtubules), 8 nm (actin), or 10 nm (intermediate filaments) is a demonstration of protein activity. 0 DΠTTP electron transfer activity is demonstrated by oxidation or reduction of NADP.
Substrates such as Asn-βGal, biocytidine, or ubiquinone-10 may be used. The reaction mixture contains 1-2 mg/ml HORP, 15 mM substrate, and 2.4 mM NAD(P)+ in 0.1 M phosphate buffer, pH 7.1 (oxidation reaction), or 2.0 mM NAD(P)H, in 0.1 M Na2HP04 buffer, pH 7.4 (reduction reaction); in a total volume of 0.1 ml. FAD may be included with NAD, according to methods weU 5 known in the art. Changes in absorbance are measured using a recording specfrophotometer. The amount of NAD(P)H is stoichiometricaUy equivalent to the amount of substrate initiaUy present, and the change in A340 is a direct measure of the amount of NAD(P)H produced; ΔA340 = 6620[NADH]. DTTHP activity is proportional to the amount of NAD(P)H present in the assay. The increase in extinction coefficient of NAD(P)H coenzyme at 340 nm is a measure of oxidation activity, or the o decrease in extinction coefficient of NAD(P)H coenzyme at 340 nm is a measure of reduction activity (Dalziel, K. (1963) J. Biol. Chem. 238:2850-2858).
DTTHP transcription factor activity is measured by its abiHty to stimulate transcription of a reporter gene (Liu, H.Y. et al. (1997) EMBO J. 16:5289-5298). The assay entails the use of a weU characterized reporter gene construct, LexAop-LacZ, that consists of LexA DNA transcriptional 5 control elements (LexAop) fused to sequences encoding the E. coH LacZ enzyme. The methods for constructing and expressing fusion genes, introducing them into ceUs, and measuring LacZ enzyme activity, are weH known to those skiUed in the art. Sequences encoding DITHP are cloned into a plasmid that directs the synthesis of a fusion protein, LexA-DTTHP, consisting of DTTHP and a DNA binding domain derived from the LexA transcription factor. The resulting plasmid, encoding a LexA- o DTTHP fusion protein, is introduced into yeast ceHs along with a plasmid containing the LexAop-LacZ reporter gene. The amount of LacZ enzyme activity associated with LexA-DTTHP transfected ceHs, relative to control ceHs, is proportional to the amount of transcription stimulated by the DTTHP.
Chromatin activity of DITHP is demonstrated by measuring sensitivity to DNase I (Dawson, B.A. et al. (1989) J. Biol. Chem. 264:12830-12837). Samples are treated with DNase I, foUowed by 5 insertion of a cleavable biotinylated nucleotide analog, 5-[(N-biotinamido)hexanoamido-ethyl-l,3- thiopropionyl-3-aminoaUyl]-2 -deoxyuridine 5 -triphosphate using nick-repair techniques weU known to those skiUed in the art. FoUowing purification and digestion with EcoRI restriction endonuclease, biotinylated sequences are affinity isolated by sequential binding to streptavidin and biotinceUulose. Another specific assay demonstrates the ion conductance capacity of DTTHP using an electiophysiological assay. DTTHP is expressed by transforming a mammaHan ceU Hne such as COS7, HeLa or CHO with a eukaryotic expression vector encoding DITHP. Eukaryotic expression vectors are commerciaUy available, and the techniques to introduce them into ceHs are weU known to those skiUed in the art. A smaU amount of a second plasmid, which expresses any one of a number of marker genes such as β-galactosidase, is co-transformed into the ceUs in order to aUow rapid identification of those ceUs which have taken up and expressed the foreign DNA. The ceUs are incubated for 48-72 hours after transformation under conditions appropriate for the ceU Hne to aUow expression and accumulation of DTTHP and β-galactosidase. Transformed ceUs expressing β- galactosidase are stained blue when a suitable colorimetric substrate is added to the culture media under conditions that are weU known in the art. Stained ceUs are tested for differences in membrane conductance due to various ions by electiophysiological techniques that are weU known in the art. Untransformed ceUs, and/or ceUs transformed with either vector sequences alone or β-galactosidase sequences alone, are used as controls and tested in paraUel. The contribution of DTTHP to cation or anion conductance can be shown by incubating the ceUs using antibodies specific for either DTTHP. The respective antibodies wiUbind to the extraceUular side of DTTHP, thereby blocking the pore in the ion channel, and the associated conductance.
XV. Functional Assays
DTTHP function is assessed by expressing dithp at physiologicaUy elevated levels in mammaHan ceU culture systems. cDNA is subcloned into a mammaHan expression vector containing a strong promoter that drives high levels of cDNA expression. Vectors of choice include pCMV SPORT (Life Technologies) and pCR3.1 (Invitrogen Corporation, Carlsbad CA), both of which contain the cytomegalovirus promoter. 5-10 μg of recombinant vector are transiently transfected into a human ceU Hne, preferably of endotheHal or hematopoietic origin, using either Hposome formulations or electroporation. 1-2 μg of an additional plasmid containing sequences encoding a marker protein are co-transfected.
Expression of a marker protein provides a means to distinguish transfected ceUs from nontransfected ceUs and is a reHable predictor of cDNA expression from the recombinant vector. Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP; CLONTECH), CD64, or a CD64-GFP fusion protein. Flow cytometry (FCM), an automated laser optics-based technique, is used to identify transfected ceUs expressing GFP or CD64-GFP and to evaluate the apoptotic state of the ceUs and other ceUular properties.
FCM detects and quantifies the uptake of fluorescent molecules that diagnose events preceding or coincident with ceU death. These events include changes in nuclear DNA content as measured by staining of DNA with propidium iodide; changes in ceU size and granularity as measured by forward Hght scatter and 90 degree side Hght scatter; down-regulation of DNA synthesis as measured by decrease in bromodeoxyuridine uptake; alterations in expression of ceH surface and intraceUular proteins as measured by reactivity with specific antibodies; and alterations in plasma membrane composition as measured by the binding of fluorescein-conjugated Annexin V protein to the ceU surface. Methods in flow cytometry are discussed in Ormerod, M. G. (1994) Flow Cytometry, Oxford, New York NY.
The influence of DITHP on gene expression can be assessed using highly purified populations of ceUs transfected with sequences encoding DITHP and either CD64 or CD64-GFP. CD64 and CD64-GFP are expressed on tihe surface of transfected ceUs and bind to conserved regions of human immunoglobulin G (IgG). Transfected ceHs are efficiently separated from nontransfected ceUs using magnetic beads coated with either human IgG or antibody against CD64 (DYNAL, Inc., Lake Success NY). mRNA can be purified from the ceUs using methods weU known by those of skill in the art. Expression of mRNA encoding DTTHP and other genes of interest can be analyzed by northern analysis or microarray techniques.
XVI. Production of Antibodies
DTTHP substantiaUy purified using polyacrylamide gel electrophoresis (PAGE; see, e.g., Harrington, M.G. (1990) Methods Enzymol. 182:488-495), or other purification techniques, is used to immunize rabbits and to produce antibodies using standard protocols. Alternatively, the DITHP amino acid sequence is analyzed using LASERGENE software
(DNASTAR) to determine regions of high immunogenicity, and a conesponding peptide is synthesized and used to raise antibodies by means known to those of skiU in the art. Methods for selection of appropriate epitopes, such as those near the C-terminus or in hydrophiUc regions are weU described in the art. (See, e.g., Ausubel, 1995, supra. Chapter 11.) TypicaUy, peptides 15 residues in length are synthesized using an ABI 431 A peptide synthesizer (AppHed Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with N-maleinndobenzoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenicity. (See, e.g., Ausubel, supra.) Rabbits are immunized with the peptide-KLH complex in complete Freund's adjuvant. Resulting antisera are tested for antipeptide activity by, for example, binding the peptide to plastic, blocking with 1% BSA, reacting with rabbit antisera, washing, and reacting with radio- lodmated goat anti-rabbit IgG. Antisera with antipeptide activity are tested for anti-DITHP activity using protocols weU known in the art, including ELISA, RIA, and immunoblotting.
XVII. Purification of Naturally Occurring DITHP Using Specific Antibodies NataraUy occuning or recombinant DTTHP is substantiaUy purified by immunoaffinity chromatography using antibodies specific for DTTHP. An immunoaffinity column is constructed by covalently coupling anti-DITHP antibody to an activated chromatographic resin, such as CNBr-activated SEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin is blocked and washed according to the manufacturer's instructions. Media containing DTTHP are passed over the immunoaffinity column, and the column is washed under conditions that aUow the preferential absorbance of DTTHP (e.g., high ionic strength buffers in the presence of detergent). The column is eluted under conditions that disrupt antibody/DπΗP binding (e.g., a buffer of pH 2 to pH 3, or a high concentration of a chaotrope, such as urea or thiocyanate ion), and DTTHP is coUected.
XVIII. Identification of Molecules Which Interact with DITHP
DTTHP, or biologicaUy active fragments thereof, are labeled with 125I Bolton-Hunter reagent. (See, e.g., Bolton, A.E. and W.M. Hunter (1973) Biochem. J. 133:529-539.) Candidate molecules previously aπayed in the weUs of a multi-weU plate are incubated with the labeled DTTHP, washed, and any weUs with labeled DTTHP complex are assayed. Data obtained using different concentrations of DITHP are used to calculate values for the number, affinity, and association of DTTHP with the candidate molecules.
Alternatively, molecules interacting with DTTHP are analyzed using the yeast two-hybrid system as described in Fields, S. and O. Song (1989) Nature 340:245-246, or using commerciaUy available kits based on the two-hybrid system, such as the MATCHMAKER system (CLONTECH).
DTTHP may also be used in the PATHCALLING process (CuraGen Corp., New Haven CT) which employs the yeast two-hybrid system in a high-throughput manner to determine aU interactions between the proteins encoded by two large Hbraries of genes (Nandabalan, K. et al. (2000) U.S. Patent No. 6,057,101).
AU pubHcations and patents mentioned in the above specification are herein incorporated by reference. Various modifications and variations of the described method and system of the invention wiU be apparent to those skiUed in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific prefened embodiments, it should be understood that the invention as claimed should not be unduly Hmited to such specific embodiments. Indeed, various modifications of the above-described modes for canying out the invention which are obvious to those skiUed in the field of molecular biology or related fields are intended to be within the scope of the foUowing claims.
Q
O
Figure imgf000182_0001
CO "vf lO -O I^ OO O- O -— N n ^ lO O MD O- O - CM C ^t LO O r^ OO -O O O O O O >0 N N N N N N N N MO 00 C0 «) 00 «) (0 (D (0
Figure imgf000182_0002
Figure imgf000182_0003
O z
Q CM CTΪ lO O (0 0- 0 ^ I n ^ U ' ^ ∞ °: O >— CM CO 'st LO O r^ OO O O '— CM CM OI OJ OJ OJ OJ OJ OJ OJ OJ CO CO CO a LU
Figure imgf000183_0001
Figure imgf000183_0002
TABLE 2
SEQ ID NO: Template ID Gl Number Probability Annotation
Score
1 LI:1983416.1 :2001JAN12 g2909860 3.00E-20 NADH-ubiquinone oxidoreductase subunit CI-KFYI (Homo sapiens)
2 LI:332263.1 :2001JAN12 g5817244 5.00E-42 dJ20N2.1 (novel protein similar to yeast and bacterial cytosine deaminase) (Homo sapiens)
3 U:333886.4:2001JAN12 g31207 3.00E-27 put.thyroid hormone receptor (Homo sapiens)
4 LI:478508.1 :2001JAN12 gl2840055 4.00E-49 putative (Mus musculus)
5 U:307470.1 :2001JAN12 gl0437569 2.00E-23 unnamed protein product (Homo sapiens)
6 U:058298.1 :2001JAN12 g36615 1.00E-102 serine/threonine protein kinase
(Homo sapiens) 7- LI:205527.5:2001JAN12 gl561718 1.00E-49 Human 14-3-3 epsilon mRNA, complete eds.
8 LI:231587.1 :2001JAN12 gl872200 6.00E-13 alternatively spliced product using exon 13A (Homo sapiens)
9 U:402919.1 :2001JAN12 g!5277706 4.00E-73 homeobox protein GBX-2b (Xenopus laevis)
10 Ll:463283.1:2001 JAN 12 g!0437485 5.00E-31 unnamed protein product (Homo sapiens)
1 1 LI :072560.1 :200 IJAN 12 g1872200 4.00E-23 alternatively spliced product using exon 13A (Homo sapiens)
12 LI: 1953096.1 :200 IJAN 12 g10437569 2.00E-18 unnamed protein product (Homo sapiens)
13 LI:1076016.1 :2001JAN12 g7341372 2.00E-33 retinoblastoma-binding protein 1 - related protein (Rattus norvegicus)
14 LI:2082796.1 :2001JAN12 g4323152 3.00E-30 Ets-protein Spi-C (Mus musculus)
15 Ll:335681.3:2001 JAN12 gl4456631 2.00E-44 dJ54B20.4 (novel KRAB box containing C2H2 type zinc finger protein) (Homo sapiens)
16 U:214150.1 :2001JAN12 gl020145 5.00E-20 DNA binding protein (Homo sapiens)
17 LI:322783.15:2001JAN12 g7159799 2.00E-13 dJ351 K20.1.1 (novel C3HC4 type Zinc finger (RING finger) protein (isoform 1)) (Homo sapiens)
18 LI:422993.1 :2001JAN12 gl 0440136 4.00E-84 unnamed protein product (Homo sapiens)
19 LI: 1 172885.1:200 IJAN 12 g468708 1.00E-28 zinc finger protein (Homo sapiens)
20 LI: 1088359.1 :200 IJAN 12 gl336158 4.00E-46 pancreas only zinc finger protein
(Rattus norvegicus)
21 LI :813422.1 :200 IJAN 12 g7023216 1.00E-16 unnamed protein product (Homo sapiens)
22 LI:n8642ό.l :2001JAN12 g506502 1.00E-141 NK10 (Mus musculus)
23 LI:n82817.1 :2001JAN12 gl020145 0 DNA binding protein (Homo sapiens)
24 LI:n70153.9:2001JAN12 g7023216 7.00E-68 unnamed protein product (Homo sapiens)
25 U:1 171553.1 :2001JAN12 g6088100 1.00E-29 zinc finger protein (ZFD25) (Homo sapiens)
26 U:2121978.1 :2001JAN12 gόl 18383 2.00E-14 zinc finger protein ZNF223 (Homo sapiens) TABLE 2
SEQ ID NO: Template ID Gl Number Probability Annotation
Score
27 U:l 174292.5:2001 JAN12 g7959207 0 KIAA1473 protein (Homo sapiens)
28 U:l 179173.1 :2001JAN12 gl0835284 0 Zinc finger protein ZNF223 (amino acids 82-482) (Homo sapiens)
29 LI:2122025.1 :2001JAN12 g2689444 1.00E-93 ZNF134 (Homo sapiens)
30 LI :2049224.1 :200 IJAN 12 g7023216 3.00E-55 unnamed protein product (Homo sapiens)
31 LI: 758541.1 :200 IJAN 12 g340444 3.00E-61 zinc finger protein 41 (Homo sapiens)
32 LI:137815.1:2001JAN12 g6984172 0 zinc finger protein ZNF226 (Homo sapiens)
33 LI:335097.1 :2001JAN12 g7020440 1.00E-24 unnamed protein product (Homo sapiens)
34 LI:232059.2:2001JAN12 g9280152 6.00E-23 unnamed portein product (Macaca fascicularis)
35 U:400109.2:2001JAN12 g12698182 5.00E-29 hypothetical protein (Macaca fascicularis)
36 11:329770.1 :2001JAN12 gl0437569 3.00E-26 unnamed protein product (Homo sapiens)
37 Ll:898841.9:2001 JAN12 g7542490 4.00E-07 FK506 binding protein precursor
(Homo sapiens)
38 LI: 1183848.3:2001 JAN 12 g32063 0 Human hepatoma mRNA for serine protease hepsin.
39 LI:2037121.1 :2001JAN12 g!042082 1.00E-27 laminin alpha 4 chain (Homo
40 Ll:356090.1 :2001 JAN 12 gl0437485 2.00E-13 unnamed protein product (Homo sapiens)
41 LI:2.12142.1 :2001JAN12 g8980667 3.00E-12 PADI-H protein (Homo sapiens)
42 U:1096706.1 :2001JAN12 gl2698182 2.00E-20 hypothetical protein (Macaca fascicularis)
43 Ll:012622.1 :2001 JAN 12 g3724141 3.00E-62 myosin I (Rattus norvegicus)
44 U:l 171095.29:2001 JAN12 g56054 2.00E-22 D 100 (Rattus norvegicus)
45 U:023813.1 :2001JAN12 g6690248 3.00E-24 PRO0657 (Homo sapiens)
46 LI:229030.1 :2001JAN12 gl0437485 1.00E-20 unnamed protein product (Homo sapiens)
47 LI:1072894.9:2001JAN12 g6523797 4.00E-08 adrenal gland protein AD-002
(Homo sapiens)
48 Ll:2031263.1 :2001 JAN 12 g9588408 1.00E-82 dJ1184F4.4 (novel protein similar to nucleolar protein 4 (NOL4) (NOLP)) (Homo sapiens)
49 U:432285.3:2001JAN12 g386987 6.00E-47 ornithine aminotransferase (Homo sapiens)
50 LI:1177772.30:2001JAN12 g8101070 0 Homo sapiens golgin-like protein
(GLP) gene, complete eds.
51 LI:475420.2:2001JAN12 g4689280 0 retinoid-binding protein IRBP (Mus musculus)
52 LI:017599.3:2001JAN12 gl3559179 6.00E-11 dJ1049Gl 6.2.2 (continued from bA456N23.2 in Em:AL353777 and dJ237J2.1 in Em:AL021394) (Homo sapiens)
53 LI:030502.2:2001JAN12 g7020440 5.00E-24 unnamed protein product (Homo sapiens) TABLE 2
SEQ ID NC >: Template ID Gl Number Probability Annotation
Score
54 LI:1181337.3:2001JAN12 gl 196425 l .OOE-31 envelope protein (Homo sapiens)
55 LI: 1164672.3:2001 JAN 12 gl 1078529 ό.OOE-38 putative gag-pro-pol polyprotein (DG-75 Murine leukemia virus)
56 LI:1167059.4:2001JAN12 g7688657 septin 10 (Homo sapiens)
TABLE 3
SEQ ID NO: Template ID Start Stop Frame Pfam Hit Pfam Description E-value
3 U:333886.4:2001JAN12 779 820 forward 2 zf-C4 Zinc finger, C4 type (two domains) 0.00000073
6 U:058298.1 :2001 JAM 2 268 966 forward 1 pkinase Protein kinase domain 8.1 E-56
9 LI:402919.1 :2001JAN12 1 183 1353 forward 1 homeobox Homeobox domain 4.4E-31 15 U.-335681.3:2001 JAN12 384 572 forward 3 KRAB KRAB box 7.6E-46 15 U:335681.3:2001 JAN12 1002 1070 forward 3 zf-C2H2 Zinc finger, C2H2 type 0.000089 19 LI: 1 172885.1 :2001 JAN 12 390 458 forward 3 zf-C2H2 Zinc finger, C2H2 type 0.00000068 20 LI: 1088359.1 :200 IJAN 12 169 237 forward 1 zf-C2H2 Zinc finger, C2H2 type 0.00000037 20 LI: 1088359.1 :200 IJAN 12 590 658 forward 2 zf-C2H2 Zinc finger, C2H2 type 0.0000073 21 U:813422.1 :2001 JAM 2 204 416 forward 3 KRAB KRAB box 2.7E-20 22 LI: 1 186426.1 :2001 JAN 12 706 774 forward 1 zf-C2H2 Zinc finger, C2H2 type 0.00000014 23 LI: 1182817.1 :2001 JAN 12 255 443 forward 3 KRAB KRAB box 2.1 E-45 23 LI: 1 182817.1 :2001 JAN 12 1458 1526 forward 3 zf-C2H2 Zinc finger, C2H2 type 0.00000054 27 LI: 1 174292.5:2001 JAN 12 164 352 forward 2 KRAB KRAB box 3.2E-41 27 U:1 174292.5:2001JAN12 1334 1402 forward 2 zf-C2H2 Zinc finger, C2H2 type 0.000002 28 LI: 1 179173.1 :2001 JAN 12 221 406 forward 2 KRAB KRAB box 1.9E-33 28 LI: 1 179173.1 :2001 JAN 12 1229 1297 forward 2 zf-C2H2 Zinc finger, C2H2 type 0.00000073 29 Ll:2122025.1 :2001 JAN 12 229 396 forward 1 KRAB KRAB box 2.9E-17 29 U:2122025.1 :2001 JAN 12 801 869 forward 3 zf-C2H2 Zinc finger, C2H2 type 0.000000089 30 Ll:2049224.1 :2001 JAN 12 94 240 forward 1 KRAB KRAB box 5.4E-24 31 11:758541.1 :2001 JAM 2 273 341 forward 3 zf-C2H2 Zinc finger, C2H2 type 0.00000071 32 U:137815.1 :2001JAN12 522 695 forward 3 KRAB KRAB box 1.4E-14 32 U:137815.1 :2001JAM2 2136 2204 forward 3 zf-C2H2 Zinc finger, C2H2 type 0.00000051 43 Ll:012622.1 :2001 JAM 2 1083 1424 forward 3myosin_head Myosin head (motor domain) 3.6E-28 43 LI:012622.1 :2001JAN12 487 687 forward 1 myosin_head Myosin head (motor domain) 1.2E-23 43 LI:012622.1 :2001JAN12 920 991 forward 2myosin_head Myosin head (motor domain) 7.6E-09 51 LI:475420.2:2001JAN12 561 1589 forward 3 IRBP Interphotoreceptor retinoid-binding pro- 1.4E-185 51 LI:475420.2:2001JAN12 2410 3453 forward 1 IRBP Interphotoreceptor retinoid-binding pro' 8E-38 51 U:475420.2:2001JAN12 2519 3403 forward 2 IRBP Interphotoreceptor retinoid-binding pro- 0.0000063 55 U:1 164672.3:2001JAN12 945 1 1 15 forward 3 gag_MA Matrix protein (MA), pi 5 2.8E-15 56 LI: 1 167059.4:2001 JAN12 267 1088 forward 3 GTP_CDC Cell division protein 2E-1 14
TABLE 4
SEQ ID NO: Template ID Start Stop Frame Domain Topology
1 LI: 1983416.1 :2001 JAN 12 1 94 forward 2 TM Cytosolic
1 U:1983416.1 :2001JAM2 95 1 17 forward 2 TM Transmembrane
1 11:1983416.1 :2001 JAM2 1 18 120 forward 2 TM Non-cytosolic
2 U:332263.1:2001 JAM 2 1 622 forward 1 TM Non-cytosolic 2 11:332263.1 :2001 JAM 2 623 645 forward 1 TM Transmembrane 2 U:332263.1 :2001 JAM 2 646 656 forward 1 TM Cytosolic 2 11:332263.1 :2001 JAM2 657 679 forward 1 TM Transmembrane 2 U:332263.1 :2001 JAM 2 680 693 forward 1 TM Non-cytosolic 2 Ll:332263.1 :2001 JAM 2 694 716 forward 1 TM Transmembrane 2 LL332263.1 :2001 JAM 2 717 755 forward 1 TM Cytosolic 2 Ll:332263.1 :2001 JAM 2 756 778 forward 1 TM Transmembrane 2 U:332263.1 :2001 JAM 2 779 1110 forward 1 TM Non-cytosolic 2 U:332263.1 :2001JAN12 1 650 forward 2 TM Non-cytosolic 2 U:332263.1 :2001 JAM 2 651 673 forward 2 TM Transmembrane oo 2 LI:332263.1 :2001JAN12 674 693 forward 2 TM Cytosolic 2 LL332263.1 :2001 JAN 12 694 716 forward 2 TM Transmembrane 2 Ll:332263.1 :2001 JAM 2 717 754 forward 2 TM Non-cytosolic 2 Ll:332263.1 :2001 JAM 2 755 777 forward 2 TM Transmembrane 2 Ll:3322ό3.1 :2001 JAN 12 778 783 forward 2 TM Cytosolic 2 U:332263.1 :2001 JAM 2 784 803 forward 2 TM Transmembrane 2 Ll:332263.1 :2001 JAN 12 804 1109 forward 2 TM Non-cytosolic 2 U:332263.1 :2001 JAM 2 1 748 forward 3 TM Non-cytosolic
2 U:332263.1 :2001 JAM 2 749 766 forward 3 TM Transmembrane
2 U:332263.1 :2001 JAN 12 767 772 forward 3 TM Cytosolic
2 U:332263.1 :2001 JAN 12 773 795 forward 3 TM Transmembrane
2 Ll:332263.1 :2001 JAN 12 796 1 109 forward 3 TM Non-cytosolic
5 11:307470.1 :2001 JAM 2 1 34 forward 3 TM Cytosolic
5 U:307470.1 :2001 JAM 2 35 57 forward 3 TM Transmembrane
5 11:307470.1 :2001JAM2 58 320 forward 3 TM Non-cytosolic
6 LI:058298.1 :2001JAN12 1 430 forward 3 TM Non-cytosolic 6 U:058298.1 :2001JAN12 431 453 forward 3 TM Transmembrane 6 Ll:058298.1 :2001 JAM 2 454 529 forward 3 TM Cytosolic
TABLE 4
SEQ ID NO: Template ID Start Stop ' Frame Domain Topology 6 U:058298.1 :2001JAN12 530 552 forward 3 TM Transmembrane 6 11:058298.1 :2001 JAM 2 553 558 forward 3 TM Non-cytosolic 7 11:205527.5:2001 JAM 2 1 14 forward 1 TM Non-cytosolic 7 U:205527.5:2001JAM2 15 35 forward 1 TM Transmembrane 7 11:205527.5:2001 JAM 2 36 47 forward 1 TM Cytosolic 7 LI:205527.5:2001JAN12 48 70 forward 1 TM Transmembrane 7 LI:205527.5:2001JAN12 71 90 forward 1 TM Non-cytosolic 7 U:205527.5:2001JAM2 1 14 forward 2 TM Non-cytosolic 7 LI:205527.5:2001JAN12 15 37 forward 2 TM Transmembrane 7 LI:205527.5:2001JAN12 38 89 forward 2 TM Cytosolic 7 LI:205527.5:2001JAN12 1 14 forward 3 TM Non-cytosolic 7 LI:205527.5:2001JAN12 15 37 forward 3 TM Transmembrane 7 LI:205527.5:2001JAN12 38 89 forward 3 TM Cytosolic 9 LI :402919.1 :200 IJAN 12 1 572 forward 1 TM Non-cytosolic oo oo 9 LI :402919.1 :200 IJAN 12 573 595 forward 1 TM Transmembrane 9 11:402919.1 :2001 JAM 2 596 668 forward 1 TM Cytosolic 9 11:402919.1 :2001JAM2 669 691 forward 1 TM Transmembrane 9 U:402919.1 :200 IJAN 12 692 700 forward 1 TM Non-cytosolic 9 U:402919.1 :2001JAN12 701 723 forward 1 TM Transmembrane 9 LI:402919.1 :2001JAN12 724 741 forward 1 TM Cytosolic 9 U:402919.1 :2001JAN12 1 571 forward 3 TM Non-cytosolic 9 U:402919.1 :2001JAN12 572 594 forward 3 TM Transmembrane 9 LI:402919.1 :2001JAN12 595 676 forward 3 TM Cytosolic 9 U:402919.1 :2001JAN12 677 699 forward 3 TM Transmembrane 9 U:402919.1 :2001JAN12 700 708 forward 3 TM Non-cytosolic 9 LI:402919.1 :2001JAN12 709 731 forward 3 TM Transmembrane 9 U:402919.1 :2001JAN12 732 740 forward 3 TM Cytosolic U:072560.1 :2001JAN12 1 285 forward 1 TM Cytosolic LI:072560.1 :2001JAN12 286 308 forward 1 TM Transmembrane U:072560.1 :2001 JAN 12 309 322 forward 1 TM Non-cytosolic 11:072560.1 :2001 JAN 12 323 345 forward 1 TM Transmembrane Ll:072560.1 :2001 JAM 2 346 349 forward 1 TM Cytosolic
TABLE 4
SEQ ID NO: Template ID Start Stop Frame Domain Topology
U.O72560.1 2001JAN12 350 372 forward 1 TM Transmembrane
Ll:072560.1 2001JAN12 373 485 forward 1 TM Non-cytosolic
11:072560.1 2001JAN12 486 508 forward 1 TM Transmembrane
Ll:072560.1 2001JAN12 509 520 forward 1 TM Cytosolic
Ll:072560.1 2001 JAM 2 521 540 forward 1 TM Transmembrane
11:072560.1 2001JAN12 541 861 forward 1 TM Non-cytosolic
Ll:072560.1 2001 JAN 12 1 325 forward 2 TM Non-cytosolic
Ll:072560.1 2001 JAN 12 326 348 forward 2 TM Transmembrane
11:072560.1 2001 JAN 12 349 349 forward 2 TM Cytosolic
Ll:072560.1 2001JAM2 350 372 forward 2 TM Transmembrane
Ll:072560.1 200 IJAN 12 373 861 forward 2 TM Non-cytosolic
14 Ll:2082796.1 :2001 JAN 12 1 6 forward 2 TM Cytosolic
14 11:2082796.1 :2001 JAM 2 7 29 forward 2 TM Transmembrane
14 LI:2082796.1 :2001JAN12 30 196 forward 2 TM Non-cytosolic
00 17 LI:322783.15:2001JAN12 1 128 forward 2 TM Cytosolic
17 Ll:322783.15:2001 JAN 12 129 151 forward 2 TM Transmembrane
17 LI:322783.15:2001JAN12 152 254 forward 2 TM Non-cytosolic
18 Ll:422993.1 200 IJAN 12 1 14 forward 1 TM Non-cytosolic
18 Ll:422993.1 2001JAN12 15 37 forward 1 TM Transmembrane
18 Ll:422993.1 2001 JAN 12 38 1 10 forward 1 TM Cytosolic
18 Ll:422993.1 2001 JAN 12 1 1 1 133 forward 1 TM Transmembrane
18 Ll:422993.1 2001JAM2 134 152 forward 1 TM Non-cytosolic
18 Ll:422993.1 2001 JAM 2 153 172 forward 1 TM Transmembrane
18 Ll:422993.1 200 IJAN 12 173 476 forward 1 TM Cytosolic
18 Ll:422993.1 2001 JAN 12 1 14 forward 2 TM Non-cytosolic
18 Ll:422993.1 200 IJAN 12 15 37 forward 2 TM Transmembrane
18 Ll:422993.1 200 IJAN 12 38 48 forward 2 TM Cytosolic
18 Ll:422993.1 2001 JAN 12 49 71 forward 2 TM Transmembrane
18 Ll:422993.1 2001 JAN 12 72 476 forward 2 TM Non-cytosolic
18 Ll:422993.1 2001 JAM 2 1 14 forward 3 TM Non-cytosolic
18 Ll:422993.1 2001JAN12 15 37 forward 3 TM Transmembrane
18 Ll:422993.1 2001 JAN 12 38 108 forward 3 TM Cytosolic
TABLE 4
SEQ ID NO: Template ID Start Stop Frame Domain Topology 18 LI:422993.1 :2001JAN12 109 126 forward 3 TM Transmembrane 18 U:422993.1 :2001 JAM 2 127 475 forward 3 TM Non-cytosolic 20 LI: 1088359.1 200 IJAN 12 1 682 forward 1 TM Non-cytosolic 20 LI: 1088359.1 2001 JAN 12 683 705 forward 1 TM Transmembrane 20 LI: 1088359.1 200 IJAN 12 706 765 forward 1 TM Cytosolic 20 LI: 1088359.1 200 IJAN 12 766 788 forward 1 TM Transmembrane 20 LI: 1088359.1 200 IJAN 12 789 835 forward 1 TM Non-cytosolic 20 LI: 1088359.1 200 IJAN 12 836 858 forward 1 TM Transmembrane 20 LI: 1088359.1 200 IJAN 12 859 950 forward 1 TM Cytosolic 20 LI: 1088359.1 2001 JAM 2 1 595 forward 2 TM Non-cytosolic 20 LI: 1088359.1 200 IJAN 12 596 618 forward 2 TM Transmembrane 20 LI: 1088359.1 2001JAM2 619 630 forward 2 TM Cytosolic 20 LI: 1088359.1 2001JAM2 631 653 forward 2 TM Transmembrane 20 LI: 1088359.1 2001JAM2 654 949 forward 2 TM Non-cytosolic vo
O l 21 LI :813422.1 :200 IJAN 12 1 275 forward 1 TM Cytosolic 21 U:813422.1 :2001 JAM 2 276 298 forward 1 TM Transmembrane 21 U:813422.1 :2001 JAN 12 299 630 forward 1 TM Non-cytosolic 22 1186426.1 :2001 JAN 12 1 19 forward 2 TM Non-cytosolic 22 1186426.1 :200 IJAN 12 20 42 forward 2 TM Transmembrane 22 118642ό.l :2001JAN12 43 332 forward 2 TM Cytosolic 22 1 186426.1 :2001JAN12 333 355 forward 2 TM Transmembrane 22 1186426.1 :200 IJAN 12 356 649 forward 2 TM Non-cytosolic 23 1182817.1:2001 JAM 2 1 1063 forward 3 TM Non-cytosolic 23 1182817.1 :2001JAM2 1064 1086 forward 3 TM Transmembrane 23 1182817.1 :2001JAM2 1087 1 172 forward 3 TM Cytosolic 23 1 182811.1 :2001 JAN 12 1 173 1 1 5 forward 3 TM Transmembrane 23 1182817.1 :2001 JAN 12 1 196 1526 forward 3 TM Non-cytosolic 24 1 170153.9:2001 JAN 12 1 56 forward 2 TM Cytosolic 24 1170153.9:2001 JAN12 57 79 forward 2 TM Transmembrane 24 1170153.9:2001 JAN 12 80 83 forward 2 TM Non-cytosolic 24 1170153.9:200 IJAN 12 84 106 forward 2 TM Transmembrane 24 1170153.9:200 IJAN 12 107 470 forward 2 TM Cytosolic
Figure imgf000192_0001
ε Έ ΈΈ Έ Έ Έ Έ Έ ΈΈ Έ ΈΈ o
Q
CO CO CO CO CO C C -— ■— I— CM OI OJ — ■— ■— ,— i— ,— ,— .— .— .— ,— ■— ■— .— I— i— i— ■— ,— V V V V V V oσ o oσ α o o o o o o IO OO PUO O- ^ -O CO O i — ι^ oo oo oo o- o O CO D O CO D
Figure imgf000192_0002
N ∞ CO (00
CM CM zz —< 3 <~3
0 O 0 O
CM CM
LO LO
00 00 10 LO
Figure imgf000192_0003
O
CM OJ OJ OI OJ OJ OI OI OJ OJ OJ OI OJ CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO
G LU CD TABLE 4
SEQ ID NO: Template ID Start Stop Frame Domain Topology
31 Ll:758541.1 200 IJAN 12 918 940 forward 1 TM Transmembrane
31 Ll:758541.1 200 IJAN 12 941 944 forward 1 TM Non-cytosolic
31 U:758541.1 200 IJAN 12 945 967 forward 1 TM Transmembrane
31 Ll.758541.1 2001JAM2 968 977 forward 1 TM Cytosolic
31 Ll:758541.1 200 IJAN 12 1 319 forward 2 TM Non-cytosolic
31 Ll:758541.1 200 IJAN 12 320 342 forward 2 TM Transmembrane
31 Ll:758541.1 2001 JAN 12 343 454 forward 2 TM Cytosolic
31 Ll:758541.1 2001 JAN 12 455 472 forward 2 TM Transmembrane
31 Ll:758541.1 2001 JAN 12 473 544 forward 2 TM Non-cytosolic
31 Ll:758541.1 2001JAN12 545 567 forward 2 TM Transmembrane
31 Ll:758541.1 2001JAN12 568 626 forward 2 . TM Cytosolic
31 Ll:758541.1 200 IJAN 12 627 649 forward 2 TM Transmembrane
31 Ll:758541.1 200 IJAN 12 650 663 forward 2 TM Non-cytosolic
31 Ll:758541.1 2001 JAN 12 664 686 forward 2 TM Transmembrane to 31 Ll:758541.1 200 IJAN 12 687 706 forward 2 TM Cytosolic
31 Ll:758541.1 2001 JAN 12 707 729 forward 2 TM Transmembrane
31 Ll:758541.1 2001JAM2 730 767 forward 2 TM Non-cytosolic
31 Ll:758541.1 2001JAN12 768 790 forward 2 TM Transmembrane
31 Ll:758541.1 200 IJAN 12 791 823 forward 2 TM Cytosolic
31 Ll:758541.1 200 IJAN 12 824 846 forward 2 TM Transmembrane
31 Ll:758541.1 2001 JAM 2 847 -855 forward 2 TM Non-cytosolic
31 U:758541.1 2001 JAM 2 856 875 forward 2 TM Transmembrane
31 Ll:758541.1 200 IJAN 12 876 977 forward 2 TM Cytosolic
31 Ll:758541.1 2001JAN12 1 455 forward 3 TM Cytosolic
31 U:758541.1 2001JAN12 456 478 forward 3 TM Transmembrane
31 Ll:758541.1 2001JAN12 479 544 forward 3 TM Non-cytosolic
31 Ll:758541.1 200 IJAN 12 545 567 forward 3 TM Transmembrane
31 Ll:758541.1 200 IJAN 12 568 621 forward 3 TM Cytosolic
31 Ll:758541.1 200 IJAN 12 622 640 forward 3 TM Transmembrane
31 Ll:758541.1 2001JAN12 641 700 forward 3 TM Non-cytosolic
31 Ll:758541.1 200 IJAN 12 701 723 forward 3 TM Transmembrane
31 Ll:758541.1 2001 JAN 12 724 761 forward 3 TM Cytosolic
TABLE 4
ID NO: Template ID Start Stop Frame Domain Topology
31 11:758541.1 200 IJAN 12 762 784 forward 3 TM Transmembrane
31 11:758541.1 200 IJAN 12 785 834 forward 3 TM Non-cytosolic
31 Ll:758541.1 2001JAN12 835 857 forward 3 TM Transmembrane
31 Ll:758541.1 2001 JAN 12 858 877 forward 3 TM Cytosolic
31 11:758541.1 200 IJAN 12 878 900 forward 3 TM Transmembrane
31 U:758541.1 200 IJAN 12 901 943 forward 3 TM Non-cytosolic
31 U.758541.1 2001 JAM 2 944 966 forward 3 TM Transmembrane
31 11:758541.1 200 IJAN 12 967 976 forward 3 TM Cytosolic
32 LI: 137815.1 200 IJAN 12 1 1561 forward 1 TM Non-cytosolic
32 U:137815.1 2001 JAN 12 1562 1584 forward 1 TM Transmembrane
32 LI: 137815.1 2001 JAN 12 1585 1596 forward 1 TM Cytosolic
32 LI: 137815.1 200 IJAN 12 1597 1619 forward 1 TM Transmembrane
32 U:137815.1 2001JAN12 1620 1657 forward 1 TM Non-cytosolic
32 LI: 137815.1 2001 JAN 12 1 1235 forward 2 TM Non-cytosolic
VO ' 32 LI: 137815.1 200 IJAN 12 1236 1253 forward 2 TM Transmembrane w i 32 11:137815.1 200 IJAN 12 1254 1259 forward 2 TM Cytosolic
32 LI: 137815.1 200 IJAN 12 1260 1282 forward 2 TM Transmembrane
32 Ll:137815.1 200 IJAN 12 1283 1613 forward 2 TM Non-cytosolic
32 Ll:137815.1 2001 JAN 12 1614 1636 forward 2 TM Transmembrane
32 Ll:137815.1 200 IJAN 12 1637 1656 forward 2 TM Cytosolic
32 LI: 137815.1 2001 JAM 2 1 1050 forward 3 TM Non-cytosolic
32 LI: 137815.1 2001JAN12 1051 1073 forward 3 TM Transmembrane
32 LI: 137815.1 2001JAN12 1074 1144 forward 3 TM Cytosolic
32 Ll:137815.1 2001JAN12 1145 1167 forward 3 TM Transmembrane
32 LI: 137815.1 2001JAN12 1168 1192 forward 3 TM Non-cytosolic
32 LI: 137815.1 2001 JAN 12 1193 1215 forward 3 TM Transmembrane
32 LI: 137815.1 2001 JAN 12 1216 1235 forward 3 TM Cytosolic
32 LI: 137815.1 200 IJAN 12 1236 1256 forward 3 TM Transmembrane
32 11:137815.1 2001JAN12 1257 1265 forward 3 TM Non-cytosolic
32 LI: 137815.1 2001JAN12 1266 1288 forward 3 TM Transmembrane
32 LI: 137815.1 200 IJAN 12 1289 1593 forward 3 TM Cytosolic
32 Ll:137815.1 2001 JAM 2 1594 1616 forward 3 TM Transmembrane
TABLE 4
SEQ ID NO: Template ID Start Stop Frame Domain Topology
32 LI: 137815.1 :2001 JAN 12 1617 1656 forward 3 TM Non-cytosolic
33 11:335097.1 :2001 JAM 2 1 92 forward 3 TM Non-cytosolic
33 Ll:335097.1 :2001 JAM 2 93 1 15 forward 3 TM Transmembrane
33 Ll:335097.1 :2001 JAM 2 1 16 252 forward 3 TM Cytosolic
33 Ll:335097.1 :200 IJAN 12 253 275 forward 3 TM Transmembrane
33 Ll:335097.1 :2001 JAM 2 276 405 forward 3 TM Non-cytosolic
34 U:232059.2:2001JAN12 1 18 forward 1 TM Cytosolic
34 11:232059.2:2001 JAN 12 19 41 forward 1 TM Transmembrane
34 11:232059.2:200 IJAN 12 42 60 forward 1 TM Non-cytosolic
34 LI:232059.2:2001JAN12 61 83 forward 1 TM Transmembrane
34 LI:232059.2:2001JAN12 84 89 forward 1 TM Cytosolic
34 LI:232059.2:2001JAM2 90 1 12 forward 1 TM Transmembrane
34 11:232059.2:2001 JAM 2 1 13 1 15 forward 1 TM Non-cytosolic
34 U:232059.2:2001JAN12 1 16 138 forward 1 TM Transmembrane vo 34 LI:232059.2:2001JAN12 139 231 forward 1 TM Cytosolic -1^
34 LI:232059.2:2001JAM2 232 254 forward 1 TM Transmembrane
34 LI:232059.2:2001JAN12 255 277 forward 1 TM Non-cytosolic
34 U:232059.2:2001JAN12 278 300 forward 1 TM Transmembrane
34 LI:232059.2:2001JAN12 301 319 forward 1 TM Cytosolic
34 U:232059.2:2001JAM2 320 342 forward 1 TM Transmembrane
34 LI:232059.2:2001JAN12 343 436 forward 1 TM Non-cytosolic
34 LI:232059.2:2001JAN12 1 12 forward 2 TM Cytosolic
34 U:232059.2:2001JAN12 13 35 forward 2 TM Transmembrane
34 U:232059.2:2001JAN12 36 221 forward 2 TM Non-cytosolic
34 U:232059.2:2001JAN12 222 244 forward 2 TM Transmembrane
34 LI:232059.2:2001JAN12 245 250 forward 2 TM Cytosolic
34 LI:232059.2:2001JAN12 251 273 forward 2 TM Transmembrane
34 LI:232059.2:2001JAN12 274 287 forward 2 TM Non-cytosolic
34 U:232059.2:2001JAN12 288 310 forward 2 TM Transmembrane
34 U:232059.2:2001JAN12 31 1 436 forward 2 TM Cytosolic
34 LI:232059.2:2001JAN12 1 14 forward 3 TM Non-cytosolic
34 U:232059.2:2001JAN12 15 32 forward 3 TM Transmembrane
TABLE 4
ID NO: Template ID Start Stop Frame Domain Topology
34 11:232059.2:2001 JAM 2 33 59 forward 3 TM Cytosolic
34 U:232059.2:2001JAM2 60 82 forward 3 TM Transmembrane
34 11:232059.2:2001 JAM 2 83 91 forward 3 TM Non-cytosolic
34 U:232059.2:2001JAM2 92 1 13 forward 3 TM Transmembrane
34 11:232059.2:2001 JAN 12 1 14 261 forward 3 TM Cytosolic
34 U:232059.2:2001JAN12 262 284 forward 3 TM Transmembrane
34 LI:232059.2:2001JAN12 285 435 forward 3 TM Non-cytosolic
35 U:400109.2:2001 JAM 2 1 418 forward 1 TM Non-cytosolic
35 LI :400109.2:200 IJAN 12 419 438 forward 1 TM Transmembrane
35 LI:400109.2:2001JAN12 439 449 forward 1 TM Cytosolic
35 U:400109.2:2001JAN12 450 472 forward 1 TM Transmembrane
35 U:400109.2:2001 JAM 2 473 527 forward 1 TM Non-cytosolic
35 Ll:400109.2:2001 JAN 12 1 19 forward 2 TM Non-cytosolic
35 LI:400109.2:2001JAN12 20 39 forward 2 TM Transmembrane O l 35 U:400109.2:2001JAN12 40 62 forward 2 TM Cytosolic Λ
35 Ll:400109.2:2001 JAN 12 63 85 forward 2 TM Transmembrane
35 11:400109.2:2001 JAM 2 86 144 forward 2 TM Non-cytosolic
35 U:4u0109.2:2001JAM2 145 167 forward 2 TM Transmembrane
35 U:400109.2:2001JAN12 168 238 forward 2 TM Cytosolic
35 Li:400109.2:2001JAN12 239 261 forward 2 TM Transmembrane
35 Ll:400109.2:2001 JAN12 262 527 forward 2 TM Non-cytosolic
36 U:329770.1 :2001JAN12 1 414 forward 1 TM Non-cytosolic
36 LI:329770.1 :2001JAN12 415 437 forward 1 TM Transmembrane
36 LI:329770.1 :2001JAN12 438 580 forward 1 TM Cytosolic
36 Ll:329770.1 :2001 JAN 12 581 603 forward 1 TM Transmembrane
36 11:329770.1 :2001 JAN 12 604 617 forward 1 TM Non-cytosolic
36 11:329770.1 :2001JAM2 618 637 forward 1 TM Transmembrane
36 Li :329770.1 :200 IJAN 12 638 679 forward 1 TM Cytosolic
36 LI :329770.1 :200 IJAN 12 1 298 forward 2 TM Non-cytosolic
36 Ll:329770.1 :2001 JAN 12 299 318 forward 2 TM Transmembrane
36 Ll:329770.1 :2001 JAM 2 319 367 forward 2 TM Cytosolic
36 LI :329770.1 :200 IJAN 12 368 390 forward 2 TM Transmembrane
TABLE 4
ID NO: Template ID Start Stop Frame Domain Topology
36 LI:329770.1 :2001JAN12 391 404 forward 2 TM Non-cytosolic
36 LI:329770.1 :2001JAN12 405 427 forward 2 TM Transmembrane
36 LL329770.1 :2001 JAM 2 428 521 forward 2 TM Cytosolic
36 Ll:329770.1 :2001 JAN12 522 544 forward 2 TM Transmembrane
36 LL329770.1 :2001 JAN 12 545 553 forward 2 TM Non-cytosolic
36 Ll:329770.1 :2001 JAN 12 554 568 forward 2 TM Transmembrane
36 Ll:329770.1 :2001 JAN 12 569 580 forward 2 TM Cytosolic
36 Ll:329770.1 :2001 JAM 2 581 603 forward 2 TM Transmembrane
36 " LI:329770.1 :2001 JAM 2 604 678 forward 2 TM Non-cytosolic
36 LI :329770.1 :200 IJAN 12 1 22 forward 3 TM Non-cytosolic
36 LI :329770.1 :200 IJAN 12 23 45 forward 3 TM Transmembrane
36 LI:329770.1 :2001JAN12 46 159 forward 3 TM Cytosolic
36 LI:329770.1 :2001JAN12 160 182 forward 3 TM Transmembrane
36 U:329770.1 :2001JAN12 183 212 forward 3 TM Non-cytosolic
VO 36 Ll:329770.1 :2001 JAN 12 213 230 forward 3 TM Transmembrane
36 LL329770.1 :2001 JAN 12 231 350 forward 3 TM Cytosolic
36 Ll:329770.1 :2001 JAN 12 351 373 forward 3 TM Transmembrane
36 Ll:329770.1 :2001 JAM 2 374 412 forward 3 TM Non-cytosolic
36 Ll:329770.1 :2001 JAN12 413 435 forward 3 TM Transmembrane
36 Ll:329770.1 :2001 JAN 12 436 553 forward 3 TM Cytosolic
36 LL329770.1 :2001 JAN 12 554 576 forward 3 TM Transmembrane
36 LI:329770.1 :2001JAN12 577 579 forward 3 TM Non-cytosolic
36 L!:329770.1 :2001 JAM 2 580 602 forward 3 TM Transmembrane
36 LI:329770.1 :2001JAN12 603 678 forward 3 TM Cytosolic
42 LI: 1096706.1 :200 IJAN 12 1 1 19 forward 2 TM Cytosolic
42 LI:1096706.1 :2001JAN12 120 142 forward 2 TM Transmembrane
42 LI: 1096706.1 :2001 JAN 12 143 161 forward 2 TM Non-cytosolic
42 U:1096706.1 :2001JAN12 162 184 forward 2 TM Transmembrane
42 LI:1096706.1 :2001JAN12 185 205 forward 2 TM Cytosolic
46 Ll:229030.1 :2001 JAN 12 1 40 forward 3 TM Cytosolic
46 Ll:229030.1 :2001 JAN 12 41 63 forward 3 TM Transmembrane
46 Ll:229030.1 :2001 JAN 12 64 82 forward 3 TM Non-cytosolic
TABLE 4
SEQ ID NO: Template ID Start Stop Frame Domain Topology
46 LI :229030.1 :200 IJAN 12 83 105 forward 3 TM Transmembrane
46 11:229030.1 :2001 JAM 2 106 177 forward 3 TM Cytosolic
47 LI: 1072894.9:200 IJAN 12 1 55 forward 1 TM Cytosolic
47 I: 1072894.9:2001 JAN 12 1 55 forward 2 TM Cytosolic
49 LI:432285.3:2001JAN12 1 46 forward 1 TM Cytosolic
49 11:432285.3:2001 JAM2 47 69 forward 1 TM Transmembrane
49 11:432285.3:2001 JAM2 70 83 forward 1 TM Non-cytosolic
49 LI;432285.3:2001JAN12 84 106 forward 1 TM Transmembrane
49 LI:432285.3:2001JAN12 107 1 12 forward 1 TM Cytosolic
49 LI:432285.3:2001JAN12 1 13 135 forward 1 TM Transmembrane
49 11:432285.3:2001 JAN 12 136 688 forward 1 TM Non-cytosolic
49 LI:432285.3:2001JAN12 689 71 1 forward 1 TM Transmembrane
49 LI:432285.3:2001JAN12 712 717 forward 1 TM Cytosolic
49 11:432285.3:2001 JAM 2 718 740 forward 1 TM Transmembrane vo l 49 U:432285.3:2001JAM2 741 754 forward 1 TM Non-cytosolic
49 LI:432285.3:2001JAM2 755 777 forward 1 TM Transmembrane
49 11:432285.3:2001 JAM 2 778 875 forward 1 TM Cytosolic
49 LI:432285.3:2001JAN12 1 3 forward 2 TM Non-cytosolic
49 11:432285.3:2001 JAM 2 4 23 forward 2 TM Transmembrane
49 LI:432285.3:2001JAN12 24 42 forward 2 TM Cytosolic
49 LI:432285.3:2001JAN12 43 65 forward 2 TM Transmembrane
49 LI:432285.3:2001JAN12 66 875 forward 2 TM Non-cytosolic
49 LI:432285.3:2001JAN12 1 14 forward 3 TM Non-cytosolic
49 11:432285.3:2001 JAN 12 15 34 forward 3 TM Transmembrane
49 U:432285.3:2001JAM2 35 40 forward 3 TM Cytosolic
49 LI:432285.3:2001JAN12 41 63 forward 3 TM Transmembrane
49 LI:432285.3:2001JAN12 64 875 forward 3 TM Non-cytosolic
51 LI:475420.2:2001JAN12 1 1482 forward 1 TM Non-cytosolic
51 LI:475420.2:2001JAN12 1483 1505 forward 1 TM Transmembrane
51 LI:475420.2:2001JAN12 1506 1517 forward 1 TM Cytosolic
51 U:475420.2:2001JAN12 1518 1540 forward 1 TM Transmembrane
51 LI:475420.2:2001JAM2 1541 1572 forward 1 TM Non-cytosolic
TABLE 4
ID NO: Template ID Start Stop Frame Domain Topology
51 U:475420.2:2001JAN12 1 1484 forward 2 TM Non-cytosolic
51 LI:475420.2:2001JAN12 1485 1507 forward 2 TM Transmembrane
51 LI:475420.2:2001JAN12 1508 1527 forward 2 TM Cytosolic
51 U:475420.2:2001JAN12 1528 1550 forward 2 TM Transmembrane
51 11:475420.2:2001 JAM 2 1551 1571 forward 2 TM Non-cytosolic
51 U:475420.2:2001JAN12 1 1491 forward 3 TM Non-cytosolic
51 LI:475420.2:2001JAN12 1492 1514 forward 3 TM Transmembrane
51 LI:475420.2:2001JAM2 1515 1520 forward 3 TM Cytosolic
51 LI:475420.2:2001JAN12 1521 1543 forward 3 TM Transmembrane
51 LI:475420.2:2001JAN12 1544 1571 forward 3 TM Non-cytosolic
52 Ll:017599.3:2001 JAN 12 1 12 forward 1 TM Cytosolic
52 LI:017599.3:2001JAN12 13 31 forward 1 TM Transmembrane
52 Ll:017599.3:2001 JAN12 32 280 forward 1 TM Non-cytosolic
52 U:017599.3:2001 JAN 12 281 303 forward 1 TM Transmembrane
00 52 Ll:017599.3:2001 JAN 12 304 306 forward 1 TM Cytosolic
52 11:017599.3:2001 JAM2 1 6 forward 3 TM Cytosolic
52 U:017599.3:2001 JAM 2 7 29 forward 3 TM Transmembrane
52 U:017599.3:2001 JAN12 30 268 forward 3 TM Non-cytosolic
52 Ll:017599.3:2001 JAN 12 269 291 forward 3 TM Transmembrane
52 11:017599.3:2001 JAM2 292 306 forward 3 TM Cytosolic
54 LI: 1 181337.3:2001 JAN 12 1 1 18 forward 3 TM Non-cytosolic
54 LI: 1 181337.3:2001 JAM 2 1 19 141 forward 3 TM Transmembrane
54 LI: 1181337.3:2001 JAN 12 142 278 forward 3 TM Cytosolic
54 U:l 181337.3:2001 JAN12 279 301 forward 3 TM Transmembrane
54 LI: 1 181337.3:200 UANl 2 302 340 forward 3 TM Non-cytosolic
56 LI: 1 167059.4:2001 JAN12 1 462 forward 1 TM Non-cytosolic
56 LI: 1 167059.4:2001 JAN 12 463 485 forward 1 TM Transmembrane
56 LI: 1 167059.4:2001 JAN 12 486 693 forward 1 TM Cytosolic
56 U:1 167059.4:2001JAM2 694 716 forward 1 TM Transmembrane
56 U:1 167059.4:2001JAN12 717 725 forward 1 TM Non-cytosolic
56 LI: 1 167059.4:2001 JAM 2 726 743 forward 1 TM Transmembrane
56 LI: 1 167059.4:2001 JAN 12 744 755 forward 1 TM Cytosolic
TABLE 4
SEQ ID NO: Template ID Start Stop Frame Domain Topology
56 LI: 1167059.4:2001 JAN12 756 775 forward 1 TM Transmembrane
56 LI: 1167059.4:2001 JAN 12 776 838 forward 1 TM Non-cytosolic
56 LI: 1167059.4:2001 JAN 12 839 861 forward 1 TM Transmembrane
56 LI: 1167059.4:200 IJAN 12 862 881 forward 1 TM Cytosolic
56 LI: 1 167059.4:2001 JAM 2 882 904 forward 1 TM Transmembrane
56 LI: 1167059.4:2001 JAN12 905 932 forward 1 TM Non-cytosolic
56 LI: 1167059.4:2001 JAN 12 933 953 forward 1 TM Transmembrane
56 LI: 1167059.4:2001 JAN 12 954 956 forward 1 TM Cytosolic
56 LI: 1 167059.4:2001 JAN 12 1 778 forward 3 TM Non-cytosolic
56 LI: 1167059.4:2001 JAN 12 779 801 forward 3 TM Transmembrane
56 LI: 1167059.4:2001 JAN 12 802 813 forward 3 TM Cytosolic
56 LI: 1167059.4:2001 JAN 12 814 836 forward 3 TM Transmembrane
56 LI: 1167059.4:2001 JAN 12 837 845 forward 3 TM Non-cytosolic
56 LI:1167059.4:2001JAM2 846 865 forward 3 TM Transmembrane vo O 56 U:1167059.4:2001JAN12 866 876 forward 3 TM Cytosolic
56 LI: 1167059.4:2001 JAN 12 877 899 forward 3 TM Transmembrane
56 LI:1167059.4:2001JAN12 900 928 forward 3 TM Non-cytosolic
56 U:1167059.4:2001JAN12 929 951 forward 3 TM Transmembrane
56 LI:1167059.4:2001JAN12 952 955 forward 3 TM Cytosolic
TABLE 5
SEQ ID NO; Temp ate ID Component IC Start Stop
1 LI: 1983416.1 :2001 JAM 2 8140319T1 1 363
1 U:1983416.1 :2001JAM2 7976079H1 22 ' 435
2 U:332263.1 200 IJAN 12 458423T6 1542 2088
2 Ll;332263.1 200 IJAN 12 71850401V1 1544 2090
2 U:332263.1 200 IJAN 12 g3896002 1549 1997
2 U:332263.1 200 IJAN 12 g7374740 1910 2260
2 11:332263.1 200 IJAN 12 70312653D1 465 944
2 11:332263.1 200 IJAN 12 70314846D1 465 874
2 11:332263.1 200 IJAN 12 71852779V1 1707 2560
2 11:332263.1 200 IJAN 12 71852887V1 1020 1925
2 Ll:332263.1 200 IJAN 12 70313794D1 992 1621
2 11:332263.1 200 IJAN 12 7031 1926D1 991 1420
2 11:332263.1 200 IJAN 12 71851 150V1 1024 1885
2 Ll:332263.1 2001 JAN 12 5463866H1 1037 1234
2 U :332263.1 2001 JAN 12 70313280D1 1098 1667
2 11:332263.1 200 IJAN 12 70313919D1 1098 1533
2 Ll:332263.1 200 IJAN 12 70312150D1 1098 1405
2 Ll:332263.1 200 IJAN 12 70312922D1 1099 1614
2 11:332263.1 200 IJAN 12 70313561 Dl 1098 1464
2 Ll:332263.1 200 IJAN 12 6709051 HI 1 101 1689
2 11:332263.1 200 IJAN 12 71852837 VI 1 125 2056
2 U:332263.1 200 IJAN 12 70312621 Dl 1 1 15 1727
2 Ll:332263.1 2001JAN12 g7279826 1 1 19 1580
2 U:332263.1 200 IJAN 12 70314358D1 1 126 1688
2 11:332263.1 200 IJAN 12 3838795H1 1624 1927
2 Ll:332263.1 200 IJAN 12 703121 19D1 542 1078
2 Ll:332263.1 200 IJAN 12 70314905D1 544 936
2 Ll:332263,l 200 IJAN 12 4045920H1 545 843
2 U.-332263.1 200 IJAN 12 7206449H1 572 1 142
2 Ll:332263.1 200 IJAN 12 1379201 HI 595 760
2 Ll:332263.1 200 IJAN 12 71851568V1 636 1476
2 U:332263.1 200 IJAN 12 6292760H1 1447 1667
2 Ll:332263.1 200 IJAN 12 71854502V1 1444 2128
2 Ll:332263.1 200 IJAN 12 71853944V 1 1463 2097
2 U:332263.1 200 IJAN 12 71852987V1 1548 2374
2 11:332263.1 200 IJAN 12 5094039H1 1506 1774
2 11:332263.1 200 IJAN 12 g3922068 1536 2000
2 Ll:332263.1 200 IJAN 12 2792057H1 973 1290
2 U:332263.1 2001 JAN 12 71853948V1 1619 2269
2 Ll:332263.1 200 IJAN 12 71853977V1 1622 2375
2 Ll:332263.1 200 IJAN 12 71791379V1 1614 1966
2 U:332263.1 2001 JAN 12 71852744V1 1617 2535
2 11:332263.1 2001 JAN 12 71852463V1 1606 2613
2 Ll:332263.1 200 IJAN 12 71851278V1 1612 2325
2 Ll:332263.1 200 IJAN 12 71852263V1 1552 2473
2 U:332263.1 200 IJAN 12 70314220D1 1407 1859
2 U:332263.1 200 IJAN 12 71853922V1 1494 2529
2 Ll:332263.1 200 IJAN 12 70314489D1 1431 1835
2 Ll:332263.1 200 IJAN 12 3435614T6 1433 1944
2 Ll:332263.1 2001 JAN 12 71853327V1 1433 2106 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
2 11:332263 1:2001 JAN 12 2699409H1 1434 1726
2 Ll:332263 1:2001 JAM 2 6546955H1 1441 1992
2 11:332263 1:2001 JAN 12 71853660V1 1441 1913
2 Ll:332263 1:2001 JAN 12 g2716813 1441 1992
2 U:332263 1:200 IJAN 12 6294741 HI 1447 1786
2 Ll:332263 1:200 IJAN 12 71854495V 1 1488 2361
2 Ll:332263 1:2001 JAN! 2 71853577V 1 1598 2510
2 11:332263 1:2001 JAM 2 71850908V 1 1607 2470
2 U.-332263 1:200 IJAN 12 g3147676 1584 1993
2 Ll:332263 1:2001 JAN 12 71850788V 1 15952362
2 11:332263 1:2001 JAN 12 72333294V 1 15892170
2 U:332263 1:2001 JAN12 70311711 Dl 465 989
2 Ll;332263 1:2001 JAN 12 70312113D1 465 936
2 11:332263 1:200 IJAN 12 70313820D1 465 936
2 11:332263 1.2001JAN12 70313450D1 465 930
2 11:332263 1:200 IJAN 12 70314581 Dl 465 913
2 11:332263 1:2001 JAN 12 70313195D1 465 856
2 11:332263 1:2001 JAN12 71853150V1 1231 1647
2 11:332263 1:200 IJAN 12 71854540V1 12642359
2 11:332263 1:200 IJAN 12 4067866H1 1254 1555
2 11:332263 1:200 IJAN 12 71854017V1 12592106
2 Ll:332263 1:200 IJAN 12 71854180V1 12762074
2 11:332263 1:2001 JAN 12 71852017V1 1271 2193
2 Ll:332263 1:2001 JAN 12 71852613V1 1273 1952
2 11:332263 1:200 IJAN 12 1783069H1 1299 1550
2 11:332263 1:2001 JAN 12 71854868V1 1304 1854
2 Li:332263 1:2001JAN12 71854722V1 13102040
2 Ll:332263 1:2001JAN12 71854441V1 13162052-
2 11:332263 1:200 IJAN 12 71852902 V 1 13182084
2 Ll:332263 1:200 IJAN 12 g5591690 1332 1638
2 Ll:332263 1:200 IJAN 12 689587H1 1335 1587
2 U.-332263 1:2001 JAN 12 71850440V1 13642213
2 11:332263 1:2001 JAN 12 71852044V 1 13902459
2 11:332263 1:2001 JAM 2 g3988317 1882 1992
2 U. -332263 1:200 IJAN 12 5852582H1 1883 1992
2 11:332263 1:2001 JAN 12 71852773V 1 19762805
2 Ll:332263 1:2001 JAN12 g5544052 20192259
2 U:332263 1:200 IJAN 12 71851901 VI 2081 2539
2 Ll:332263 1:200 IJAN 12 6882753H1 21122643
2 Ll:332263 1:2001 JAN 12 2464771T6 21482207
2 U:332263 1:200 IJAN 12 71855066V1 2148 2738
2 Ll:332263 1:2001 JAN 12 956688H1 22032259
2 Ll:332263 1:2001 JAN 12 2050914H1 23722632
2 Ll:332263 1:200 IJAN 12 6275080H2 24292907
2 11:332263 1:2001 JAN12 4631208T6 24962611
2 11:332263 1:2001 JAN 12 4631208F6 2503 2640
2 11:332263 1:2001 JAN12 4631208H1 25042668
2 Ll:332263 1:200 IJAN 12 2675055T6 2801 3292
2 Ll:332263 1:2001 JAN12 6882753J1 2851 3454
2 11:332263 1:2001 JAN12 g3675996 2881 3330 TABLE 5
ΞQ ID NO: Tern plate ID Component IE : Start Stop
2 U:332263 1:2001 JAN12 g4738663 2987 3330
2 11:332263 1:2001 JAN 12 g5540864 3200 3330
2 Ll:332263 1:200 IJAN 12 71851008V1 1140 1785
2 11:332263 1:200 IJAN 12 g 1933806 1159 1471
2 Ll:332263 1:2001 JAN 12 g7456875 1179 1578
2 11:332263 1:2001 JAN 12 71851136V1 1177 2042
2 U -.332263 1:200 IJAN 12 71851292V1 1183 2132
2 11:332263 1:2001 JAN 12 71854201 VI 1205 1881
2 11:332263 1:2001 JAN 12 71516539V! 1195 1890
2 11:332263 1:2001 JAM 2 2356586F6 1227 1487
2 LI .-332263 1:200 IJAN 12 2356586H1 1227 1475
2 11:332263 1:200 IJAN 12 g6713066 832 1290
2 11:332263 1:200 IJAN 12 5990242H1 835 1128
2 ' Ll:332263 1:2001JAN12 71514509V1 864 lόόό
2 11:332263 1:200 IJAN 12 5803858H1 909 1231
2 11:332263 1:200 IJAN 12 4669412H1 913 1126
2 Ll:332263 1:2001 JAN12 7445962T1 925 1465
2 LI .-332263 1:2001 JAN12 70314212D1 939 1369
2 11:332263 1:2001 JAN 12 71854789V1 933 1798
2 Ll:332263 1:200 IJAN 12 71853078V1 1005 1905
2 11:332263 1:200 IJAN 12 3435614F6 766 1303
2 11:332263 1:200 IJAN 12 3435614H1 766 1015
2 11:332263 1:200 IJAN 12 2541284H1 767 1020
2 11:332263 1:2001 JAN 12 g5878326 782 1244
2 11:332263 1:200 IJAN 12 g2878341 822 1195
2 Ll:332263 1:2001 JAN 12 1647392H1 86 258
2 LI -.332263 1:2001 JAN 12 8114707H1 128 628
2 11:332263 1:2001 JAN12 458423R6 213 631
2 11:332263 1:200 IJAN 12 458423H1 213 495
2 Ll:332263 1:200 IJAN 12 3166541 HI 229 529
2 11:332263 1:200 IJAN 12 1256781 HI 251 527
2 LI.-332263 1:200 IJAN 12 2464771F6 286 704
2 11:332263 1:200 IJAN 12 2464771 HI 286 540
2 Ll:332263 1:200 IJAN 12 g746839 347 536
2 11:332263 1:200 IJAN 12 2495063F6 395 778
2 11:332263 1:200 IJAN 12 2495063H1 ' 395 644
2 Ll:332263 1:2001 JAN 12 70313857D1 435 936
2 Ll:332263 1:2001 JAN 12 70313094D1 435 884
2 Ll:332263 1:200 IJAN 12 70313906D1 465 838
2 Ll:332263 1:2001 JAN 12 70311631 Dl 465 1079
2 11:332263 1:2001 JAN 12 70314994D1 465 988
2 11:332263 1:2001 JAN12 71854192V1 1677 2375
2 11:332263 1:200 IJAN 12 71851234V1 16442306
2 11:332263 1:2001 JAN 12 71856454V1 1875 2556
2 11:332263 1:2001 JAN 12 g3934367 1837 2269
2 LI.-332263 1:2001 JAM 2 71855032V1 15342237
2 11:332263 1:2001 JAN 12 71851502V1 1561 2209
2 U:332263 1:200 IJAN 12 70313005D1 1126 1721
2 Ll:332263 1:200 IJAN 12 5327304H1 540 825
2 U.-332263 1:200 IJAN 12 70311072D1 542 986 TABLE 5
SEQ ID NO: Template ID Component ID Start Stop
2 11:332263.1 200 IJAN 12 70313719D1 542 936
2 Ll:332263.1 200 IJAN 12 70314533D1 542 936
2 LI .-332263.1 2001JAN12 6080132H1 715 1204
2 11:332263.1 2001JAN12 4591037H1 736 1000
2 11:332263.1 200 IJAN 12 46671 1 OH 1 745 936
2 Ll:332263.1 200 IJAN 12 5289580H1 759 1026
2 U:332263.1 200 IJAN 12 70313461 Dl 676 1096
2 11:332263.1 2001 JAN 12 70314649D1 685 1096
2 LI -.332263.1 200 IJAN 12 70314563D1 692 1096
2 11:332263.1 2001 JAN 12 70313370D1 692 1095
2 Ll:332263.1 200 IJAN 12 70315127D1 710 1096
2 Ll:332263.1 200 IJAN 12 70312983D1 709 936
2 11:332263.1 200 IJAN 12 71856291 VI 1381 2033
2 11:332263.1 200 IJAN 12 70315216D1 1407 1978
2 LI .-332263.1 200 IJAN 12 2495063T6 1550 1946
2 11:332263.1 200 IJAN 12 g3701266 1553 1989
2 U -.332263.1 200 IJAN 12 g3896306 1560 2002
2 Ll:332263.1 200 IJAN 12 g3896000 1578 1998
2 11:332263.1 2001 JAN 12 458423F1 1528 2231
2 11:332263.1 200 IJAN 12 71852936V 1 7 781
2 Ll:332263.1 200 IJAN 12 6022701 F7 1 596
2 11:332263.1 200 IJAN 12 6022701 HI 1 190
2 Ll:332263.1 200 IJAN 12 71850609V1 1 1 601
2 Ll:332263.1 200 IJAN 12 2675055F6 n 403
2 Ll:3322ό3.1 2001 JAN 12 2675055H1 1 1 276
2 11:332263.1 200 IJAN 12 2677085H1 1 1 251
2 11332263.1 200 IJAN 12 7062834H1 30 594
2 11:332263.1 200 IJAN 12 1647392F6 75 51 1
2 Ll:332263.1 2001 JAN 12 g6452151 1796 2260
2 U:332263.1 200 IJAN 12 gό470559 1845 2259
2 Ll:332263.1 200 IJAN 12 71853046V1 1543 2264
2 11:332263.1 2001 JAN 12 703151 17D1 1 126 1677
2 U:332263.1 200 IJAN 12 70313508D1 1 126 1405
2 Ll:332263.1 200 IJAN 12 70314381 Dl 1 126 1404
2 Ll:332263.1 200 IJAN 12 70313034D1 465 768
2 11:332263.1 200 IJAN 12 3033484H1 465 752
2 Ll:332263.1 200 IJAN 12 7031 1789D1 466 856
2 Ll:332263.1 200 IJAN 12 70313425D1 466 842
2 Ll:332263.1 200 IJAN 12 70313899D1 470 936
2 Ll:332263.1 200 IJAN 12 7031 1446D1 465 909
2 Ll:332263.1 2001 JAN 12 70313753D1 472 855
2 Ll:332263.1 200 IJAN 12 50391 12H2 488 732
2 U:332263.1 200 IJAN 12 7031 1908D1 536 936
2 Ll:332263.1 200 IJAN 12 g2820485 539 760
2 11:332263.1 200 IJAN 12 g 1933745 1776 2260
2 Ll:332263.1 200 IJAN 12 71853831 VI 1700 2148
2 11:332263.1 200 IJAN 12 g433251 1 1701 1992
2 Ll:332263.1 200 IJAN 12 4666990H1 1710 1998
2 Ll:332263.1 2001JAM2 3267078H1 171 1 2035
2 Ll:332263.1 200 IJAN 12 71850938V1 1753 2686 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
2 11:332263.1 :2001JAN12 71856550V 1 1628 2425
2 11:332263.1 :2001 JAN 12 71854618V1 1662 2017
2 11:332263.1 :200 IJAN 12 71854970V 1 1729 2251
2 11:332263.1 :2001JAM2 71853990V1 1907 2079
3 LI:333886.4:2001JAN12 55098527H1 1 795
3 U:333886.4:2001JAN12 55138862H1 57 475
3 11:333886.4:2001 JAN12 55138870J1 56 850
3 11:333886.4:2001 JAN 12 55138962H1 57 727
3 11:333886.4:2001 JAN 12 55138986H1 58 751
3 11:333886.4:2001 JAN 12 55138954H1 92 777
3 U:333886.4:2001JAN12 55138878H1 92 862
3 U:333886.4:2001JAN12 55138886H1 92 81 1
3 11:333886.4:2001 JAN12 55138970J1 106 737
3 U:333886.4:2001JAN12 2932976H1 1 16 386
3 U:333886.4:2001JAN12 g6704745 150 703
3 U:333886.4:2001JAN12 7077452H1 232 815
3 LI:333886.4:2001JAN12 55138894H1 329 522
3 U:333886.4:2001JAN12 g4734742 477 952
3 11:333886.4:2001 JAN12 1549488H1 505 717
4 Ll:478508.1 :2001 JAN12 4614387H1 175 420
4 11:478508.1. '2001JAN12 5692056H1 180 256
4 Ll:478508.1 :2001 JAN 12 6247423H1 124 592
4 11:478508.1 :2001 JAN 12 673771 1T8 195 275
4 Ll:478508, 1 :2001 JAN 12 8031730J2 1 707
4 U:478508.1 :2001JAN12 2532826H1 76 256
4 LI:478508.1 :2001JAN12 8031592J2 1 12 709
4 11:478508.1 :2001 JAN 12 2007965R6 286 471
4 Ll:478508.1 :2001 JAN 12 2007965T6 331 778
4 LI:478508.1 :2001JAN12 g5850995 373 797
4 11:478508. :2001 JAN12 g2953720 405 798
4 U:478508.1 :2001 JAN 12 g7280553 435 794
4 LI ;478508.1 :200 IJAN 12 g54461 15 609 800
4 11:478508.1 :2001 JAN 12 2007965H1 212 413
4 LI:478508.1 :2001JAN12 4617844F6 264 769
4 U:478508.1 :2001 JAM 2 4617844H1 264 516
4 U:478508.1 :2001JAN12 5449985H1 197 256
5 11:307470.1 :2001JAN12 55025674H1 448 963
5 Ll:307470.1 :2001 JAN 12 55025674J1 445 962
5 11:307470.1 :2001 JAM2 g2881602 443 626
5 LI:307470.1 :2001JAN12 6885575J1 431 544
5 LI:307470.1 :2001JAN12 8053491 Jl 1 523
6 LI-.058298.1 :2001 JAN 12 71870273V1 1083 1678
6 Ll:058298.1 :2001 JAN 12 531491 OH 1 1083 1331
6 LI:058298.1 :2001JAN12 1698381 To 670 1228
6 LI:058298.1 :2001JAN12 1698381 F6 417 915
6 Ll:058298.1 :2001 JAN 12 g2539246 558 852
6 Ll:058298.1 :2001 JAN 12 55067990H1 389 682
6 11:058298.1 :200 IJAN 12 55067990J1 389 681
6 U:058298.1 :2001JAN12 55068290J1 1 681
6 11:058298.1 :2001 JAN 12 55068296J1 32 680 C/3 m Θ
— ' — o o o o ^ > ) j j j j < j j ) j ) ) j <) j j > j j aj m
Z
O
O O O O ^ J^ 4-^ 4-» ^ ^ ^ 4-» J-^ -^ 4-^ ^ 4^ ^ J-^ -» -» 4^ 4-^ ^ ^ 4-^ -^ -l^ ^ -v -^
VI VI VI vJ C C> C> C> O O O O O O O O O O O O O O O O O O O O O O O O O O C0 C3 G0 C0 C0 C0 O Cπ 0l 0l 0ι 0l 0l 0l Cπ 0l M M M M ) U U Q M M M M N3 M M M M M M M M 3 N3 M M M M W N3 M M M -' -' -' --' -' -' n a (» 03 (» W Cn Cn 0l 0l ND ND ND ND - > D D > > -O ) > > -O > D -O > - > > D > ) -O -O - ^ c O^ oo ffl oa a -' -' -' -j -' -' -j -' -' -' -j -i -' -' -' -' -j -' -' -' -' -' -' -j -' -' CD OJ oo ffl K αi io ^ -O 'O 'O 'O -O -o -o ϊ O O O O G C G3 GJ ) D ) > 5 > > - > > > - D ) D - ) D > D -O -O ^ ^
M M M M M M t M fo i iO M M ro iO M W oo ooo o ooo ooo oooo oooo ooo oo ooo oo o oo oo oooooooooo ooo o o →- _ <L_ C_ C- <_ C_ _ _ _ _ C-, C_ _ <L_ C_ C_ C_ C_ C_ C-. C_ C_ C_ C_ C_ (1_ C_ _ C_ C_ _ C_ _ C_ _ C_ _ _ (L_ C_
>>>>> >>> >> >> > >>> > >>> > >> >> > > > > > > >> > >> > > >> >> >>> >> >> > ΕΕΕΕΕΕΕιΕΕΕΕΕΕΕΕΕΕ,ΕΕΕιΕΕΕΕΕΕΕΕΕΕ:ΕΕΕΕΕΕ?:ΕΕΕ:ΕΕΕΕ;ΕΕΕΕ
Figure imgf000206_0001
_ Λ
_ cn ND 41- ND ' θ w o j- o -' -' -' -' -' -' -' ω o l <) <'( t> Oi cn vi o co oo N o — o -. ω ω oo <3 - o NO Go C co
C^ C^ σ ^ ∞ 5 « ^ O ^ N0 - ^ - ∞ ^ O ^ ^ D ∞ CD ^ C> D O O —' NJ O KD O O O O O O O O O
^ ^ ^ ^ S V S rn -^ ^ —' 0 0 -' Cn O C0 3 -^ Cn -' 03 J-» -0 0- —' ^ —- ND C g ^ ^ M O . O-i .v-l .—- ' O . C -3 .JΪ. SI -O O O O O c —o • -o
TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
Ll:072560.1 200 IJAN 12 7383278H1 1 160 1563 Ll:072560.1 200 IJAN 12 70720947V1 1054 1522 Ll:072560.1 200 IJAN 12 3461875H1 1445 1502 Ll:072560.1 2001 JAN 12 70718361 VI 882 1213 Ll:072560.1 200 IJAN 12 6888979H1 885 1 140 LI .-072560.1 200 IJAN 12 70718581 VI 860 1 140 Ll:072560,l 2001 JAN 12 70721033V1 877 1 140 Ll:072560.1 200 IJAN 12 70715974V1 897 1 140 Ll:072560.1 200 IJAN 12 70716831 VI 498 1 130 Ll:072560.1 2001 JAN 12 70719723V 1 484 1 128 Ll:072560.1 200 IJAN 12 70716832V 1 478 1088 Ll:072560.1 200 IJAN 12 7290565R8 879 101 1 U-.072560.1 200 IJAN 12 7290565R6 797 1011 LI-.072560.1 200 IJAN 12 g4565346 2337 2584 Ll:072560.1 200 IJAN 12 6078609H1 2428 2584 LI-.072560.1 200 IJAN 12 g2556167 2200 2558 Ll:072560.1 200 IJAN 12 g2223594 2201 2480 Ll:072560.1 200 IJAN 12 8051578J1 1917 2442 Ll;072560.1 200 IJAN 12 70647879V1 2216 2387 Ll:072560.1 200 IJAN 12 2153472T6 1780 2362 Ll:072560.1 200 IJAN 12 2828618T6 1777 2350 LI .-072560.1 2001JAN12 2153472F6 1701 2201 Ll:072560.1 200 IJAN 12 70717413V1 1793 2198 Ll:072560.1 200 IJAN 12 7072071 1 VI 1698 2084 LI .-072560.1 2001 JAN 12 70715804V 1 1477 2072 Ll:072560.1 200 IJAN 12 5182174H2 1955 2055 Ll:072560.1 2001JAN12 70720554V 1 1456 2013 Ll:072560.1 200 IJAN 12 70717671 VI 1683 1912 Ll:072560.1 200 IJAN 12 g2000573 1573 1901 Ll:072560.1 200 IJAN 12 70717070V1 1464 1899 Ll:072560.1 200 IJAN 12 70716090V1 1683 1895 Ll:072560.1 200 IJAN 12 7071 1495V1 1734 1859 Ll:072560.1 200 IJAN 12 2153472H1 1701 1810 Ll:072560.1 200 IJAN 12 70647864V1 1439 1746 Ll:072560.1 2001 JAN 12 70715717V1 1441 1746 Ll:072560.1 2001 JAN 12 70712315V1 913 1633 Ll:072560.1 2001 JAN 12 70720508V1 1052 1620 LI .-072560.1 200 IJAN 12 70718736V1 1439 1608 Ll:072560.1 2001 JAN 12 7071591 1 VI 1473 1594 Ll:072560.1 200 IJAN 12 70716279V1 1384 1594 LI.-072560.1 200 IJAN 12 7290765R8 880 1008 Ll:072560.1 200 IJAN 12 70716585V1 372 877 Ll:072560.1 200 IJAN 12 70716866V1 372 828 Ll:072560.1 2001JAN12 2828618F6 372 859 Ll:072560.1 200 IJAN 12 2828618H1 372 674 Ll:072560.1 200 IJAN 12 5501305R6 1 440
LI: 1953096.1 :200 IJAN 12 6391308T8 1 620 LI: 1076016.1 :2001 JAN 12 8186651 HI 298 966 LI:1076016.1 :2001JAN12 71934981 VI 362 811 LI:1076016.1 :2001JAN12 71924015V1 362 825 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
13 LI: 1076016.1 2001 JAN 12 7463085H1 1 592
13 Ll:1076016.1 200 IJAN 12 g5100484 32 221
13 LI: 1076016.1 200 IJAN 12 g5810696 42 222
13 LI .-1076016.1 200 IJAN 12 8015729J1 223 820
13 LI: 1076016.1 2001 JAN 12 8054154J1 223 699
13 LI: 1076016.1 2001 JAN 12 7608984J1 225 782
14 Ll:2082796.1 200 IJAN 12 8273177T1 1 591
15 Ll:335681.3:2001 JAN12 71652050V1 1 652
15 Ll:335681.3:200 IJAN 12 71657465V 1 2 583
15 Ll:335681.3:2001 JAN12 3171 186F6 1 420
15 Ll:335681 .3:2001 JAM 2 3168910H1 1 198
15 LI:335681 .3:2001JAN12 2820819H1 50 380
15 Ll:335681 .3:2001 JAN12 5027433H1 346 555
15 Li:335681.3:200 IJAN 12 3171 186T6 375 862
15 Ll:335681.3:200 IJAN 12 20131 1 1 H1 405 621
15 Ll;335681.3:2001 JAN12 g4306593 452 901
15 Ll:335681.3:2001 JAN12 g680951 487 724
15 Ll:335681.3:2001 JAN12 3934551 HI 521 801
15 Ll:335681.3:2001 JAN12 3934519H1 522 808
15 U.-335681.3:2001 JAN 12 281 1041T6 583 857
15 LI-.335681.3:2001 JAN 12 70623951 VI 531
15 LI :335681.3:200 IJAN 12 71657365V 1 580
15 Ll:335681.3:2001 JAN12 71653874V 1 514
15 Ll:335681.3:2001 JAN12 70621934V1 437
15 Ll:335681.3:2001 JAN12 3171 186H1 198
15 Ll:335681.3:2001 JAN 12 71597437V1 639
15 Ll:335681.3:2001 JAN 12 281 1041 F6 590 896
15 LI -.335681.3:200 IJAN 12 281 1041 HI 590 873
15 Ll:335681.3:2001 JAN12 5870171 HI 632 917
15 Ll:335681.3:2001 JAN12 g3922387 734 1 165
15 LI.-335681.3:2001 JAN12 g3785451 796 1 1 15
16 Ll:214150.1 :2001 JAN 12 gl 733190 602 718
16 U:214150.1 :2001JAN12 6890523H1 605 703
16 U:214150.1 :2001JAN12 71984167V1 1 595
16 LI:214150.1 :2001JAN12 71984038V1 159 703
16 LI:214150.1 :2001JAN12 4646294F6 422 703
16 LI:214150.1 :2001JAN12 4646294H1 422 689
16 Ll:214150.1 :2001 JAN 12 3703252F6 495 703
16 LI:214150.1 :2001JAN12 3703252H1 495 717
17 LI:322783.15:2001JAN12 7208772H1 1 1 15
17 Ll:322783.15:2001 JAN12 7609888J1 131 685
17 LI:322783.15:2001JAN12 2650602H1 184 425
17 ϋ:322783.15:2001 JAM 2 2763906H1 195 433
17 LI:322783.15:2001JAN12 7207858T8 317 765
17 LI:322783.15:2001JAN12 7154586H1 1 527
17 LI:322783, 15:2001JAM2 7206356H1 1 490
17 LI:322783.15:2001JAN12 7207535H1 3 379
17 Ll:322783.15:2001 JAN 12 7209140H1 1 622
17 Ll:322783.15:2001 JAN 12 7209489H1 1 618
17 Ll:322783.15:2001 JAN 12 7210871 HI 1 603 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
17 U:322783.15:2001JAM2 7685079H1 1 550
17 11:322783.15:2001JAM2 6964009H1 1 560
18 U:422993.1 ;2001JAN12 6310043H1 1 415
18 11:422993.1 :2001JAN12 6310340F8 1 394
18 Ll:422993.1 :2001JAN12 4722590H1 52 213
18 Ll:422993.1 :2001JAN12 6310043T8 205 700
18 LI.-422993.1 :2001JAN12 8094246H1 542 1031
18 11:422993.1 .-2001JAN12 7594841 HI 808 1359
18 Ll:422993.1 :2001JAN12 gόό39991 1013 1429
18 U:422993.1 :2001JAM2 g7457143 1021 1425
18 U.-422993.1 :200IJAN12 g7320380 1064 1434
19 I:1172885. 1:200IJAN12 1923488T6 101 575
19 LI:1172885. 1:2001JAN12 4768094T6 42 599
19 LI:1172885. 1:200IJAN12 4874348H1 194 468
19 LI:1172885. 1:2001JAM 2 1923296T6 203 574
19 LI:1172885. 1:2001JAN12 5085341 F6 204 638
19 LI:1172885. 1:200IJAN12 1923488R6 1 438
19 LI:1172885. 1:2001JAN12 1923296R6 1 389
19 LI:1172885. 1:200IJAN12 1923296H1 1 299
19 LI:1172885. 1:2001JAN12 1923488H1 1 280
19 LI:1172885. I.-2001JAN12 5760908T8 24 486
19 LI:1172885. 1:200IJAN12 5085341 HI 182 384
19 LI:1172885. 1:2001JAN12 1914106H1 242 502
20 LI:1088359. 1:200IJAN12 3956181H1 1975 2272
20 LI:1088359. 1:200IJAN12 6779755F6 1953 2564
20 LI:1088359. 1:200IJAN12 3603043H1 1971 2277
20 LI:1088359. 1:2001JAN12 5309011 HI 1972 2212
20 LI:1088359. 1:200IJAN12 2201396T6 14962090
20 LI:1088359. 1:200IJAN12 71250004V1 1516 2177
20 LI:1088359. 1:2001JAN12 71063424V1 1205 1871
20 LI:1088359. 1:200IJAN12 71066769V1 1166 1771
20 LI:1088359. 1:2001JAN12 71063893V1 1146 1697
20 LI:1088359. 1:2001JAN12 3169092H1 1948 2220
20 LI:1088359. 1:200IJAN12 71834740V1 1198 2149
20 LI:1088359. 1.-2001JAN12 6532650H1 1247 1746
20 LI:1088359. 1:200IJAN12 71249332V1 1931 2620
20 LI:1088359. 1:200IJAN12 71064441VI 17802297
20 LI:1088359. 1:200IJAN12 71249705V1 18222360
20 LI:1088359. 1:2001JAM2 5203628H1 1841 2141
20 LI:1088359. 1:2001JAN12 71065750V1 1710 2238
20 , LI:1088359. 1:2001JAN12 2503778F6 1732 2264
20 LI.-1088359. 1:200IJAN12 2503778H1 17322000
20 LI:1088359. 1:200IJAN12 861926H1 1484 1742
20 LI:1088359. l:2001JAN12 71836277V1 1133 2130
20 LI:1088359. 1:2001JAN12 71835532V1 17052334
20 LI:1088359. 1:2001JAN12 71834886V1 1627 2541
20 LI:1088359. 1:2001JAN12 6702383H1 1135 1763
20 LI:1088359. 1:200IJAN12 71249827V1 1084 1780
20 LI:1088359. 1:200IJAN12 6023487H1 1453 1758
20 LI:1088359. 1:2001JAN12 4597863H1 1455 1710 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
20 LI: 1088359.1 :200 IJAN 12 71834883V1 1410 2328
20 LI; 1088359.1 :200 IJAN 12 1617721 Fό 1424 1890
20 LI: 1088359.1 :2001 JAN 12 1617721H1 1424 1597
20 LI: 1088359.1 :2001 JAN 12 71835736V1 1479 2201
20 LI: 1088359.1 :2001 JAN 12 4550412H1 1600 1865
20 LI: 1088359.1 :2001 JAM2 5216519H1 1621 1769
20 LI: 1088359.1 :200 IJAN 12 71250431VI 1632 2303
20 LI: 1088359.1 :200 IJAN 12 71835228V1 1039 1291
20 LI: 1088359.1:2001 JAN12 2762282H1 1069 1323
20 LI: 1088359.1 :200 IJAN 12 71063570V1 934 1567
20 LI: 1088359.1 :2001 JAN 12 71837048V1 917 1919
20 LI: 1088359.1 :200 IJAN 12 71063612V1 1342 1951
20 LI; 1088359.1 :200 IJAN 12 71837212V1 13902090
20 LI: 1088359.1 :200 IJAN 12 71839966V1 1279 1781
20 LI: 1088359.1:200 IJAN 12 2790964H1 1303 1627
20 LI: 1088359.1 :200 IJAN 12 71063820V1 1328 1970
20 LI: 1088359.1 :2001 JAN 12 71063338V1 1340 2064
20 U:1088359.1 :2001JAN12 71838280V1 1326 1764
20 LI .-1088359.1 :200 IJAN 12 71838522V1 2027 2636
20 LI: 1088359, 1 :2001 JAN 12 4539127H1 2042 2325
20 LI: 1088359.1 :200 IJAN 12 71065280V1 1566 2201
20 LI: 1088359.1 :2001 JAM 2 2201388T6 1573 2089
20 U:1088359.1 :2001JAN12 71248778V1 15552067
20 LI:1088359.1 :2001JAN12 3346980H1 1272 1572
20 LI: 1088359.1 :200 IJAN 12 71065161V1 1260 1972
20 LI: 1088359.1 :200 IJAN 12 6423133H1 889 1467
20 LI: 1088359.1 :2001 JAN 12 71249545V1 849 1498
20 LI: 1088359.1 :200 IJAN 12 71066585V1 1 602
20 LI: 1088359.1 :2001 JAN 12 4059690H1 1 256
20 LI: 1088359.1 :200 IJAN 12 4059690F6 1 211
20 LI: 1088359.1:200 IJAN 12 6490754R6 89 647
20 LI: 1088359.1 :2001 JAM 2 6490754R9 109 644
20 LI: 1088359.1 :200 IJAN 12 7216266H1 158 715
20 LI:1088359.1:2001JAN12 71835294V1 165 1068
20 LI .-1088359.1 :200 IJAN 12 4791776F8 165 790
20 LI:1088359.1 :2001JAN12 .4791776H1 165 442
20 LI: 1088359.1 :2001 JAM 2 71834608V1 165 324
20 LI: 1088359.1 :200 IJAN 12 71835333V1 199 1082
20 LI -.1088359.1 :200 IJAN 12 5923409H1 259 577
20 LI:1088359.1 :2001JAN12 71066229V1 279 969
20 LI: 1088359.1 :200 IJAN 12 6472286H1 338 968
20 U:1088359.1 ;2001JAN12 71065422V1 341 972
20 U:1088359.1 :2001JAN12 71064427V1 358 900
20 LI: 1088359.1 :2001 JAN 12 71836729V1 447 861
20 LI: 1088359.1 :200 IJAN 12 71835921VI 462 1128
20 LI:1088359.1 :2001JAN12 71838277V1 463 900
20 LI: 1088359.1 :200 IJAN 12 71064989V1 476 1171
20 LI:1088359.1:2001JAN12 71064876V1 504 1115
20 LI: 1088359.1 :200 IJAN 12 71837042V1 507 1363
20 LI: 1088359.1 :2001 JAN 12 71835214V1 531 1286 TABLE 5
ID NO: Template ID ι Component IC Start Stop
20 LI:1088359.1:2001JAN12 71063437V1 520 1234
20 11:1088359.1.2001JAN12 71065765V1 551 1211 0 LI:1088359, 1:200IJAN12 71063360V1 577 1282 0 LI:1088359.1:200IJAN12 71837470V1 599 1244 0 LI:1088359.1:2001JAN12 71835745V1 625 1324 0 LI:1088359.1:2001JAN12 71066606V1 633 1393 0 LI:1088359.1:200IJAN12 71066167V1 638 1330 0 LI:1088359.1:200IJAN12 71834968V1 645 1276 0 LI:1088359.1:2001JAN12 71835067V1 645 1284 0 U:1088359.1:2001JAN12 71834934V1 647 1346 0 LI.-1088359,1:200IJAN12 71249028V1 672 1340
20 11:1088359.1 :2001JAN12 71835549V1 664 1305 0 11:1088359.1 :2001JAN12 71835748V1 668 1328 0 LI:1088359.1:2001JAN12 3778142H1 684 997
20 11:1088359.1 :2001JAM2 71837689V1 691 1508 0 LI:1088359.1:200IJAN12 71834711VI 692 1224 0 LI:1088359.1:2001JAN12 71835154V1 694 1329 0 LI:1088359.1:200IJAN12 71836394V1 698 1369
20 LI:1088359.1:2001JAN12 71834554V1 699 1328 0 11:1088359.1 :2001JAN12 5379413H1 703 968 0 LI: 1088359.1:200IJAN12 71249318V1 716 1449 0 LI:1088359.1:2001JAN12 71064614V1 729 1417
20 LI:1088359.1:200IJAN12 71836396V1 729 1554
20 LI:1088359.1:2001JAN12 70868279V1 828 1528
20 LI:1088359.1:200IJAN12 2201396F6 811 1250
20 LI:1088359.1:200IJAN12 2201396H1 811 1104
20 LI:1088359.1:2001JAN12 4725513H1 20242310 0 LI:1088359.1:2001JAN12 6779755H1 2013 2582
20 LI:1088359.1:2001JAN12 5512450H1 1252 1497
20 11:1088359.1 :2001JAN12 71837110V1 1010 1782 0 Li:1088359.1:2001JAN12 71064542V1 1 548
20 Li:1088359.1:200IJAN12 5335469H1 20052242
20 LI:1088359.1:2001JAN12 5335451 HI 2005 2244
20 LI:1088359.1:2001JAN12 71063814V1 971 1667 0 LI:1088359.1:200IJAN12 71249963V1 1516 2117
20 U:1088359.1:2001JAN12 71063801V1 936 1572
20 LI:1088359.1:200IJAN12 g1297935 930 1156
20 11:1088359.1 -.2001JAM2 71063143V1 936 1615
20 U:1088359.1:2001JAM2 71836240V1 1200 2060
20 LI:1088359.1:2001JAM2 71835981V1 21062665
20 LI:1088359.1:2001JAM2 7100805H1 2108 2581
20 LI:1088359.1:2001JAM2 4550412T1 21132637
20 U:1088359.1:2001JAM2 1617721T6 2123 2672
20 LI:1088359.1:2001JAM2 2503778T6 21642661
20 LI:1088359.1:2001JAM2 4059690T6 21702633
20 LI: 1088359.1:2001JAN12 5897856H1 21702444
20 LI:1088359.1:200IJAN12 5897857H1 21702442
20 LI:1088359.1:2001JAM2 g6300913 22022633
20 LI:1088359.1:2001JAM2 g6576959 22222633
20 LI:1088359.1:2001JAN12 g4852169 22262633 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
20 LI:1088359.1 :2001JAM2 2230817F6 22592765
20 LI:1088359.1 :2001JAN12 2230817H1 22592511
20 LI:1088359.1 -.2001JAN12 2585714F6 22622817
20 LI:1088359.1 :2001JAM2 2585714H1 22622537
20 LI:1088359.1 :2001JAN12 5691260H1 22632530
20 LI:1088359.1 :2001JAN12 7350573H1 2263 2709
20 LI: 1088359.1 :200IJAN12 5084156H1 2267 2521
20 LI:1088359.1 :2001JAN12 2585714T6 23092811
20 LI:1088359.1 -.2001JAN12 7259781T6 23142519
20 LI:1088359.1 :2001JAN12 g1267511 2327 2705
20 LI:1088359.1 :2001JAN12 1422186H1 23482595
20 LI:1088359.1 ;2001JAN12 1421986H1 2348 2559
20 LI: 1088359.1 :200IJAN12 3599245H1 23532633
20 LI:1088359.1 :2001 AN12 g6462695 23702849
20 LI:1088359.1 :2001JAN12 g4970568 2371 2846
20 LI:1088359.1 :2001JAN12 1311825H1 23692628
20 LI:1088359.1 :2001JAM2 g3770489 23872852
20 LI: 1088359.1 :2001JAN12 g4264897 24292848
20 LI:1088359.1 .-2001JAN12 g2969595 24342633
20 LI:1088359.1 :2001JAM2 g3430805 2437 2849
20 LI:1088359.1 :2001JAN12 g3758006 24522828
20 LI:1088359.1 :2001JAN12 g3755236 24582846
20 LI:1088359.1 :2001JAN12 g7278916 2461 2848
20 LI:1088359.1 :2001JAN12 g3298627 24662625
20 LI:1088359.1 :2001JAN12 3424273H1 24932633
20 I:1088359.1 :2001JAN12 g3191621 25142847
20 LI:1088359.1 :2001JAN12 g2910683 25192770
20 LI:1088359.1 -.200IJAN12 241452H1 25432637
20 LI:1088359.1 :2001JAN12 g3134539 2641 2850
20 LI:1088359.1 :2001JAN12 g6035581 27262846
21 11:813422.1: 200IJAN12 894136H1 24 202
21 LI:813422 1: 200IJAN12 3685359F6 46 515
21 Ll:813422 1: 2001JAN12 2497517F6 93 592
21 Ll:813422 1: 200IJAN12 2497517H1 93 413
21 LI:813422 1: 200IJAN12 70167792V1 98 520
21 LI:813422 1: 2001JAN12 7992231 HI 104 554
21 LI:813422 1: 200IJAN12 70168921V1 131 651
21 Ll:813422 1: 200IJAN12 8124465H1 11 671
21 LI.-813422 1: 2001JAM2 70164598V1 1430 1890
21 Ll:813422 1: 2001JAN12 g1485371 1771 1911
21 Ll:813422 1: 2001JAM 2 6593962F8 1 169
21 Ll:813422 1: 2001JAN12 6593962H1 1 169
21 LI-.813422 1: 2001JAN12 7978969H1 11 650
21 Ll:813422 1: 200IJAN12 893591 HI 25 315
21 U:813422 1: 200IJAN12 8102680H1 45 624
21 11:813422 1: 200IJAN12 3685359H1 46 358
21 11:813422 1: 200IJAN12 g4689970 203 625
21 11:813422 1: 200IJAN12 3685359T6 278 556
21 11:813422 1: 200IJAN12 2497517T6 380 534
21 Li:813422 1: 200IJAN12 70165351VI 407 888 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
21 LI-.813422.1 :2001JA 12 70169649V1 392 534
21 11:813422.1 :2001JA 12 70165894V1 426 932
21 Ll:813422.1 :200IJAN12 g4074539 413 534
21 LI:813422.1 :2001JAN12 70166959V1 491 1010
21 U:813422.1 :2001JAN12 70166310V1 505 984
21 11:813422.1 :2001JAN12 70169334V1 600 1080
21 Ll:813422.1 :2001JAN12 70168807V1 731 1199
21 Ll:813422.1 -.200IJAN12 70165598VI 822' 1303
21 LI:813422.1 :2001JAN12 70166092V1 869 1371
21 Ll:813422.1 :2001JAN12 1006417H1 967 1150
21 U:813422.1 ;2001JAN12 70165798V1 998 1514
21 11:813422.1 :2001JAN12 70164520V1 1059 1493
21 11:813422.1 :2001JAN12 70166061VI 1116 1651
21 Ll:813422.1 :200IJAN12 6526780H1 1118 1714
21 LI:813422.1 :2001JAN12 70168451VI 1137 1627
21 11:813422.1 :2001JAN12 5263856H2 1149 1244
21 Ll:813422.1 :2001JAN12 3338768H1 1244 1485
21 11:813422.1 ;2001JAN12 70166692V1 1277 1766
21 Ll:813422.1 :2001JAN12 466542H1 1322 1470
21 U:813422.1 :2001 AN12 6433091 HI 1331 1735
21 U:813422.1 :2001JAN12 6433091T8 1413 1735
22 LI.-1186420. 1:200IJAN12 4030602T6 1584 1906
22 LI:1186426. 1:200IJAN12 7068262H1 1789 1906
22 U:1186426. 1:2001JAN12 1749930H1 1799 1906
22 LI:1186426. 1:200IJAN12 3929925T6 1513 1906
22 LI: 1186426. 1:2001JAM2 3765992T6 1529 1906
22 LI:1186426. 1:2001JAN12 3973648H1 1445 1721
22 LI:1186426. 1:2001JAN12 2639995T6 1563 1906
22 LI: 1186426. 1:200IJAN12 4140846T9 26 550
22 LI:1186426. 1:2001JAM2 7254119H1 491 935
22 LI.-1186426. 1:200IJAN12 3280090F7 1697 1906
22 LI;1186426. 1:200IJAN12 g389786 1040 1462
22 LI: 1186426. 1:200IJAN12 1269724F1 1117 1602
22 LI:1186426. 1:200IJAN12 6934544H1 1361 1875
22 LI:1186426. 1:200IJAN12 111294F1 1503 1906
22 LI:1186426. 1:200IJAN12 3418022H2 1525 1769
22 LI:1186426. 1:2001JAN12 1309523H1 961 1135
22 LI: 1186426. 1:200IJAN12 g3281713 1105 1463
22 LI: 1186426. 1:2001JAN12 111294T6 1556 1906
22 LI: 1186426. 1:200IJAN12 g6476089 1689 1906
22 LI:1186426. 1:2001JAN12 1269724H1 1117 1324
22 LI:1186426. 1:200IJAN12 3236461 HI 424 641
22 LI:1186426. 1:2001JAN12 2639995H1 830 1076
22 LI:1186426. 1:2001JAN12 4326477F6 1 381
22 LI:1186426. 1:2001JAN12 4123479H1 1222 1457
22 LI:1186426. 1:200IJAN12 4729333H1 1304 1392
22 LI: 1186426. 1:200IJAN12 g2823562 1641 1906
22 LI:1186426. 1:200IJAN12 g2162375 725 915
22 LI:118ό426. 1:2001JAM2 6420281F7 653 1184
22 LI:1186426. 1:200IJAN12 6166536F8 739 1220 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
22 LI:1186426.1 200IJAN12 2639995F6 830 1347
22 LI:1186426,1 200IJAN12 5089719H1 1005 1291
22 LI:1186426.1 200IJAN12 gl 158071 1771 1906
22 LI:1186426.1 200IJAN12 g5394522 1629 1906
22 LI:1186426.1 200IJAN12 g4308651 1731 1906
22 LI:1186426.1 200IJAN12 7278738H1 1026 1590
22 LI:1186426.1 200IJAN12 1396166H1 1252 1508
22 LI:1186426,1 2001JAN12 7081835H1 125 677
22 LI:1186426.1 200IJAN12 1946252H1 961 1173
22 LI:1186426.1 200IJAN12 2964009F6 1432 1743
22 LI;1186426.1 200IJAN12 2964009H1 1433 1733
22 LI:1186426.1 200IJAN12 8016538J1 650 1137
22 LI: 1186426.1 200IJAN12 707351OH1 119 689
22 LI: 1186426.1 200IJAN12 6594866H2 204 442
22 LI:1186426.1 200IJAN12 7254119R8 491 1055
22 LI:1186426.1 200IJAN12 6405368H1 315 599
22 LI:1186426.1 200IJAN12 g1994599 1754 1906
22 LI:1186426.1 200IJAN12 2639996F6 830 1218
22 LI:1186426.1 200IJAN12 2639996T6 1408 1938
22 LI:1186426.1 200IJAN12 4779651 HI 1413 1646
22 LI:1186426.1 200IJAN12 3398318H1 1431 1691
22 LI:1186426.1 200IJAN12 6420281T8 1446 1958
22 LI:1186426.1 2001JAN12 3973648T7 1492 1948
22 LI:1186426.1 200IJAN12 815993H1 1490 1725
22 LI:1186426.1 200IJAN12 g1636974 1586 1880
22 LI:1186426.1 200IJAN12 g4971669 1595 1906
22 LI:1186426.1 200IJAN12 4326477T6 1606 1906
22 LI:1186426.1 200IJAN12 3280090H1 1614 1837
22 LI:1186426.1 200IJAN12 5658235H1 1688 1942
22 LI:1186426.1 200IJAN12 7068162H1 1734 1906
22 LI:1186426.1 200IJAN12 3530095H1 1794 1938
22 LI:1186426.1 200IJAN12 3530095F6 1795 1906
22 LCI 186426.1 200IJAN12 400967H1 1110 1287 .
22 LI: 1186426.1 200IJAN12 4326477H1 3 160
22 LI:1186426.1 200IJAN12 3752010T6 1737 1915
22 LI:1186426.1 200IJAN12 3888314H1 52 314
22 LI:1186426.1 200IJAN12 1335071 HI 515 737
22 LI:1186426.1 200IJAN12 2639980H1 830 1081
22 LI:1186426.1 200IJAN12 1760413H1 1442 1684
22 LI:1186426.1 200IJAN12 1784584H1 430 641
22 LI:1186426.1 200IJAN12 1269724F6 1117 1519
23 LI:1182817,1 .200IJAN12 936773R6 1848 2321
23 11:1182817.1 .2001JAN12 4515965F8 109 297
23 LI:1182817.1 2001JAN12 630308H1 107 236
23 LI:1182817.1 ,200IJAN12 4526224H1 111 388
23 LI:1182817.1 200IJAN12 71076180V1 1997 2541
23 LI:1182817.1 :2001JAN12 71075053V1 20082480
23 LI:1182817.1 200IJAN12 4761107H1 1612 1898
23 LI:1182817.1 200IJAN12 5283776H1 1559 1724
23 LI:1182817.1 200IJAN12 70012416D1 40364158 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
23 LI:1182817,1 200IJAN12 70004713D1 39774158
23 LI:1182817.1 2001JAN12 71078853V1 16422263
23 LI:1182817,1 200IJAN12 6815352F8 45184581
23 11:1182817.1 2001JAN12 5296321T6 25132895
23 Ll:l 182817,1 200IJAN12 70818565V1 25862917
23 LI: 1182817.1 2001JAN12 g2820002 33463673
23 Ll:1182817.1 200IJAN12 2192463H1 37633895
23 LI: 1182817,1 200IJAN12 4054850H1 121 409
23 11:1182817.1 200IJAN12 2072020H1 98 350
23 LI:1182817.1 200IJAN12 70875970V1 21342337
23 11:1182817.1 200IJAN12 6045182H1 18802331
23 11:1182817.1 200IJAN12 7979070H1 16872316
23 U:l 182817.1 200IJAN12 5580371H2 20642313
23 LI:1182817,1 2001JAN12 5989373F6 39594158
23 LI:1182817.1 200IJAN12 7119092F8 167 697
23 LI:1182817.1 200IJAN12 g5365749 38194158
23 11:1182817.1 200IJAN12 g2344282 447 712
23 LI:1182817.1 2001JAN12 g5398345 455 712
23 LI:1182817.1 200IJAN12 5960949F8 542 1148
23 LI:1182817,1 200IJAN12 3663229H1 523 801
23 L!:l 182817.1 200IJAN12 71080192V1 18372262
23 11:1182817.1 200IJAN12 71077979V1 20802499
23 Ll:l 182817.1 200IJAN12 5702634H1 112 376
23 U:l 182817.1 2001JAN12 5312350H1 115 363
23 U:l 182817.1 200IJAN12 g3959779 145 464
23 U:l 182817.1 2001JAM2 7405974H1 646 1035
23 LI:1182817.1 200IJAN12 70876563V1 22452448
23 U:l 182817.1 200IJAN12 4441737F8 24203049
23 LI:1182817.1 200IJAN12 4441737T8 24202969
23 LI:1182817.1 200IJAN12 71080855V1 24962934
23 LI:1182817.1 2001JAN12 5081881 HI 1195 1378
23 LI:1182817.1 2001JAN12 5950751 F6 116 730
23 LI:1182817.1 200IJAN12 g498151 107 5577
23 U:l 182817.1 200IJAN12 5985407T6 116 539
23 LI;1182817.1 200IJAN12 625247R6 150 759
23 LI:1182817.1 200IJAN12 6983827H1 173 507
23 11:1182817.1 200IJAN12 7030415R6 274 868
23 LI:1182817.1 200IJAN12 g4088247 294 764
23 LI:1182817.1 200IJAN12 4107871T6 438 685
23 LI:1182817.1 200IJAN12 6989339H1 118 288
23 LI:1182817.1 200IJAN12 70000686D1 38094158
23 LI:1182817.1 200IJAN12 70002146D1 3687 3895
23 Ll-.l 182817.1 200IJAN12 7405488H1 38604158
23 Ll-.l 182817,1 200IJAN12 6885580H1 1034 1349
23 U:1182817.1 200IJAN12 g1258928 1898 2185
23 LI;1182817.1 200IJAN12 3680137H1 34843771
23 LI:1182817.1 200IJAN12 g3147068 3387 3734
23 Ll:l 182817.1 200IJAN12 078484R6 44984581
23 LI:1182817.1 200IJAN12 70003344D1 39554158
23 LI:1182817.1 200IJAN12 2280513R6 95 529 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
23 LI:1182817,1 2001JAN12 g2703382 3881 4158
23 11:1182817.1 200IJAN12 g5152312 25562933
23 Lllll 82817.1 200IJAN12 3957291 H2 112 389
23 Ll-.l 182817,1 2001JAN12 1650615H1 103 319
23 LI:1182817,1 2001JAN12 625247H1 150 412
23 LI: 1182817.1 200IJAN12 5283776F7 1559 2015
23 LI:1182817.1 200IJAN12 70009908D1 3990 4158
23 U:l 182817.1 200IJAN12 5704237H1 112 386
23 LI:1182817.1 200IJAN12 70008902D1 3983 4466
23 11:1182817.1 2001JAN12 4890756F6 40344570
23 11:1182817.1 200IJAN12 4890756H1 4034 4308
23 Ll:l 182817.1 200IJAN12 70010596D1 40644483
23 11:1182817.1 200IJAN12 1979451 HI 41004301
23 LI: 1182817.1 200IJAN12 078484H1 42394537
23 11:1182817.1 200IJAN12 4057979H1 44524581
23 11:1182817.1 2001JAN12 g6576327 470 712
23 Ll:l 182817,1 200IJAN12 71075096V1 18852328
23 LI:1182817.1 200IJAN12 71079767V1 2401 2932
23 LI:1182817.1 200IJAN12 5950751 HI 111 432
23 LI:1182817.1 2001JAN12 2344725H1 119 340
23 11:1182817.1 200IJAN12 g!476715 3471 3896
23 LI:1182817.1 200IJAN12 7329981 HI 1781 2217
23 LI:1182817.1 2001JAN12 g4687524 1821 2239
23 LI:1182817.1 200IJAN12 736134H1 613 712
23 LI:1182817,1 200IJAN12 3047721HI 112 407
23 LI:1182817.1 200IJAN12 71834483V1 91 951
23 LI:1182817,1 200IJAN12 5701788T7 111 596
23 11:1182817.1 200IJAN12 6407301 HI 88 582
23 LI:1182817,1 200IJAN12 4594380H2 88 350
23 LI:1182817.1 200IJAN12 2280513T6 220 674
23 LI:1182817.1 2001JAN12 2052608H1 908 1199
23 Ll:l 182817.1 200IJAN12 5293501 HI 1176 1366
23 LI:1182817.1 200IJAN12 5960949H1 533 1094
23 11:1182817.1 200IJAN12 71834509V1 499 875
23 LI: 1182817.1 2001JAN12 6885580F8 903 1349
23 LI:1182817,1 200IJAN12 7636187H1 1318 1792
23 LI:1182817.1 200IJAN12 1877551 HI 40904158
23 LI:1182817.1 200IJAN12 3048843H1 112 395
23 Ll;l 182817.1 2001JAN12 3209530F6 3421 3911
23 Ll:l 182817.1 200IJAN12 5403290H1 35093773
23 LI:1182817,1 200IJAN12 4593603H1 3562 3860
23 LI:1182817.1 200IJAN12 70010308D1 36874036
23 LI:1182817.1 200IJAN12 gό709927 37394191
23 U:1182817.1 2001JAN12 2192463F6 3763 4192
23 LI:1182817.1 200IJAN12 5989373H1 39384188
23 LI:1182817.1 200IJAN12 6058282F8 3958 4158
23 Ll:l 182817.1 200IJAN12 5684836H1 98 332
23 11:1182817.1 200IJAN12 7119092F6 167 771
23 LI:1182817.1 200IJAN12 71079787V1 20122631
23 LI:1182817.1 200IJAN12 71078724V1 1740 2342 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
23 U:l 182817.1 200IJAN12 6045182J1 1880 2331
23 11:1 182817,1 200IJAN12 71078819V1 19962663
23 U:l 182817.1 200IJAN12 70011951 Dl 39284158
23 11:1 182817.1 200IJAN12 7033305H1 1702 2261
23 11:1 182817.1 200IJAN12 5651323H1 1723 2267
23 Ll:l 182817.1 200IJAN12 2280513H1 95 376
23 Ll;1 182817.1 2001JAN12 7430784H1 2310 2549
23 Ll:l 182817.1 200IJAN12 6815352H1 45184603
23 Ll:l 182817.1 200IJAN12 7035275H1 44984592
23 Ll:l 182817.1 200IJAN12 70006655D1 44984790
23 U:l 182817,1 200IJAN12 4885835H1 44754581
23 Ll:l 182817,1 200IJAN12 3209530T6 44984570
23 Ll:l 182817.1 200IJAN12 70005861 Dl 44984581
23 11:1 182817.1 200IJAN12 2766706H1 115 406
23 Ll:l 182817,1 200IJAN12 71077855V1 20802506
23 11:1 182817.1 200IJAN12 5701788F7 111 698
23 U:l 182817.1 200IJAN12 4441737H1 24202554
23 11:1 182817.1 200IJAN12 gl 163665 107 270
23 Ll:l 182817.1 200IJAN12 1553221 F6 58 551
23 Ll:l 182817.1 200IJAN12 5985407F6 1 664
23 Ll:l 182817.1 2001JAN12 6904578H1 75 604
23 U:l 182817,1 200IJAN12 5985407H1 2 287
23 LI: 7 182817.1 200IJAN12 1553221To 51 545
23 Ll;l 182817.1 200IJAN12 6985359R8 79 299
23 Ll:l 182817.1 200IJAN12 5701788H1 110 379
23 Ll:l 182817,1 200IJAN12 6765960H1 32443645
23 11:1 182817.1 200IJAN12 3436771 HI 39954158
23 U:l 182817.1 200IJAN12 5704329H1 111 366
23 Ll:1 182817.1 200IJAN12 5906230H1 26742970
23 Ll:1 182817.1 200IJAN12 g5754660 2807 3060
23 Ll:1 182817.1 200IJAN12 5637896H1 2841 3099
23 Ll:l 182817.1 200IJAN12 4613246H1 28633104
23 Ll:l 182817.1 200IJAN12 6478968F6 29553570
23 Ll:l 182817.1 200IJAN12 6478968H1 29553544
23 Ll:l 182817.1 200IJAN12 5374252H1 3313 3535
23 U:l 182817.1 200IJAN12 g6989944 33233734
23 11:1 182817.1 200IJAN12 70009231 Dl 34153906
23 Ll:l 182817.1 200IJAN12 4906748H2 98 390
23 Ll:1 182817.1 200IJAN12 4515965H1 112 349
23 Ll:l 182817.1 200IJAN12 412642H1 1762 1969
23 Ll:l 182817.1 200IJAN12 7400633H1 34253930
23 Ll:l 182817.1 200IJAN12 5637796H1 2840 3099
23 Ll:1 182817.1 200IJAN12 605965H1 508 712
23 Ll:l 182817.1 200IJAN12 8184771 HI 2821 3341
23 Ll:1 182817.1 200IJAN12 7119092H1 167 569
23 U:l 182817.1 200IJAN12 71078169V1 23402700
23 Ll:l 182817.1 200IJAN12 g296457 2107 3078
23 U:l 182817.1 200IJAN12 547620H1 44984581
23 U:l 182817,1 200IJAN12 7039205H1 19592437
23 Ll;l 182817.1 200IJAN12 71076276V1 25952932 TABLE 5
SEQ ID NO; Template ID Component IC Start Stop
23 Ll.-l 182817,1 :2001JAN12 6045182R8 18802330
23 Ll:l 182817.1:2001 JAN 12 619505H1 34053650
23 Ll:l 182817.1-.2001JAN12 6045182F8 18802331
23 U:l 182817.1:2001 JAN 12 70007048D1 37704158
23 Ll;l 182817, 1:2001 JAN 12 547620R1 44954581
23 Ll:l 182817,1 :2001JAN12 5635348H1 28403101
23 11:1 182817.1-.2001JAN12 6970765H1 39254158
23 Ll:l 182817.1:2001JAN12 3209530H1 34223599
23 Ll:l 182817.1:2001 JAN 12 3682137H1 34843775
23 Ll:l 182817.1:2001JAN12 6407333H1 95 634
23 11:1 182817.1:2001JAN12 1553221 HI 95 238
24 Ll;l 170153.9:2001JAN12 6780122J1 235 646
24 11:1 170153.9:2001 JAN 12 7637752H1 1 492
24 LI: 1 170153.9:200 IJAN 12 7711512J1 358 982
24 U:l 170153.9:200 IJAN 12 7711512H2 890 1411
25 U:l 171553.1:2001JAN12 gl 139962 29293148
25 U:l 171553.1:2001JAN12 7279338H1 1527 2023
25 11:1 171553.1:200 IJAN 12 2766073F6 32 524
25 Ll:l 171553.1:2001JAN12 6847280H1 17182229
25 LI: 1 171553,1 :2001JAN12 6847279H1 17192222
25 11:1 171553.1:2001JAN12 7030822H1 15052034
25 Ll:l 171553.1 :2001JAN12 70749683V 1 1025 1592
25 LI-,1 171553.1:2001 JAN 12 4357194H1 1045 1147
25 U:l 171553.1:2001 JAM 2 g728172 1066 1333
25 Ll:1 171553.1:2001JAN12 70746065V 1 1101 1676
25 Ll:l 171553.1:2001 JAN 12 6847280F6 17182238
25 Ll:l 171553.1 :2001JAN12 6803301 HI 34 400
25 U:l 171553.1;2001JAN12 6803301 Jl 34 410
25 U:l 171553.1 :2001JAN12 7584018H1 39 585
25 Ll:l 171553.1 :2001JAN12 4622596H1 57 301
25 Ll:l 171553.1:2001JAN12 579075H1 59 320
25 Ll:l 171553.1:2001JAN12 2210964H1 60 328
25 Ll:1 171553.1:2001 JAN12 7269665H1 172 689
25 U:l 171553.1;2001JAN12 493487H1 225 467
25 Ll:l 171553.1:2001JAN12 3346144T6 26193132
25 Ll:l 171553.1:2001JAN12 5543896T8 1641 2201
25 Ll:l 171553.1:2001JAN12 70748784V1 14862057
25 Ll:1 171553.1:2001 JAN 12 3965837H1 854 964
25 LI:! 171553.1.-2001JAN12 3962267H1 854 1084
25 Ll:1 171553.1.-2001JAN12 70755507V1 858 1049
25 Ll:l 171553.1:2001JAN12 g 1927512 801 1268
25 Ll:1 171553.1:2001JAN12 8186679H1 932 1554
25 Ll:l 171553.1:2001 JAM 2 70746640V1 947 1273
25 Ll:l 171553,1:2001 JAN 12 6570232H1 809 1353
25 U:l 171553,1 :2001JAN12 70749500V1 952 1020
25 Ll:l 171553.1 ;2001JAN12 6570232F8 835 1353
25 Ll:l 171553.1 :2001JAN12 3962267T8 853 1276
25 Ll:l 171553.1:2001 JAN 12 3962293H1 854 1123
25 Ll:l 171553.1 :2001JAN12 1525067H1 990 1194
25 U:l 171553.1 :2001JAN12 g 1634138 992 1378 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
25 Ll.'l 171553.1 200IJAN12 6465961 F7 692 1262
25 U:l 171553.1 2001JAN12 6465961 HI 692 1175
25 11:1 171553.1 200IJAN12 70746313V1 702 1310
25 Ll:l 171553,1 200IJAN12 6465961 F8 717 1273
25 Ll-.l 171553,1 200IJAN12 6570232F6 745 1353
25 Ll:l 171553.1 2001JAN12 70748165V1 772 992
25 Ll:l 171553.1 200IJAN12 70747578V1 1642 2230
25 11:1 171553.1 200IJAN12 507724R1 16582102
25 Ll:l 171553.1 200IJAN12 507724H1 1658 1958
25 U:l 171553.1 200IJAN12 g505547 1561 1902
25 LI:1 171553,1 200IJAN12 3962293T9 15852087
25 Ll:l 171553.1 200IJAN12 3602618H1 1594 1910
25 Ll:l 171553.1 200IJAN12 290776H1 1807 2108
25 11:1 171553.1 200IJAN12 184712T6 18142183
25 Ll:l 171553.1 200IJAN12 4858565T7 18222105
25 Ll:l 171553.1 200IJAN12 g4988603 18352223
25 U:l 171553.1 200IJAN12 g3960390 18352221
25 LI:1 171553.1 2001JAN12 g1927397 18532221
25 Ll:l 171553.1 200IJAN12 8017138J1 18942548
25 Ll:l 171553.1 200IJAN12 3384759F6 19102209
25 U:l 171553.1 200IJAN12 3384759H1 19102169
25 Ll:l 171553.1 200IJAN12 g4302161 19192222
25 Ll:l 171553.1 200IJAN12 70749172V1 19602572
25 Ll:l 171553.1 200IJAN12 g2787880 19832141
25 Ll.-l 171553,1 200IJAN12 70747353V1 21792724
25 Ll:l 171553.1 200IJAN12 70749489V1 21802650
25 Ll:l 171553.1 200IJAN12 5574155H1 21962448
25 U:l 171553.1 200IJAN12 70755004V1 22002348
25 Ll:l 171553.1 200IJAN12 6052963Jl 23022656
25 Ll:l 171553.1 200IJAN12 7690493H1 24192507
25 Ll:l 171553.1 200IJAN12 5574155T9 2501 3038
25 LI: 1 171553.1 200IJAN12 2766073T6 25483111
25 Ll:l 171553.1 200IJAN12 70747745V1 1127 1742
25 Ll:1 171553,1 2001JAN12 70750202V1 1148 1612
25 Ll:l 171553.1 200IJAN12 184712R6 1194 1548
25 Ll:l 171553.1 200IJAN12 184712F1 16592241
25 Ll:l 171553.1 200IJAN12 6052963H1 16692170
25 U:l 171553.1 200IJAN12 70754414V1 15122077
25 Ll:1 171553.1 200IJAN12 2766073H1 32 324
25 Ll:l 171553.1 200IJAN12 3371190H1 1 209
25 Ll:1 171553.1 200IJAN12 70747296V1 25 598
25 Ll:1 171553.1 200IJAN12 70761984V1 32 354
25 Ll.-l 171553.1 200IJAN12 70748552V1 550 1139
25 Ll:l 171553.1 200IJAN12 6169913H1 1538 1838
25 Ll:1 171553.1 200IJAN12 7042860H1 15972163
25 Ll:l 171553.1 2001JAN12 7042860R8 15982194
25 U:l 171553.1 2001JAN12 7042860F8 15982138
25 Ll:l 171553.1 200IJAN12 70749765V1 348 964
25 LI: 1 171553.1 200IJAN12 55068831Jl 385 518
25 Ll:l 171553.1 200IJAN12 55068831 HI 409 544 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
25 Ll:l 171553.1:2001 JAM 2 70746739V 1 510 1038
25 U:l 171553.1 :2001JAN12 184712H1 1194 1376
25 U:l 171553.1 :2001JAN12 70747506V1 1250 1838
25 U:l 171553.1 :2001JAN12 1505781 HI 1400 1614
25 LI:1171553.1:2001JAN12 4858565F7 1410 1865
25 LI:1171553.1:2001JAN12 4858565H1 1410 1511
25 LI:1171553.1:2001JAN12 70746771 VI 1445 1779
25 LI: 1171553, 1:2001 JAN 12 3962267T9 1640 2089
25 Li;l 171553.1 :2001JAN12 gό035502 1747 2212
25 Ll:l 171553.1 -.2001 JAN12 g2558367 1752 2212
25 U:1171553.1:2001JAN12 70754307V1 1787 2120
25 11:1171553.1:2001 JAM 2 g727817 1802 2212
25 U: 1171553.1:2001 JAM 2 8138155T1 1801 2119
25 U:l 171553.1 :2001JAN12 7941653H1 255 470
25 11:1171553.1 :2001JAM2 5543896H1 555 768
25 U:l 171553.1:2001 JAM 2 70755134V 1 644 854
25 LI: 1171553.1:2001 JAN 12 70750777V1 646 1254
25 11:1171553.1:2001 JAN12 70751065V1 639 1255
25 LI: 1171553.1:2001 JAN 12 5543896F8 555 924
26 LI:2121978.1:2001JAN12 8324855J1 1 480
27 LI: 1174292,5:2001 JAN 12 5044943H1 84 333
27 LI:1174292.5:2001JAN12 g2241254 2042 2437
27 U:1174292.δ:2001JAN12 g5636089 2156 2617
27 U:1174292.5:2001JAN12 1301303H1 516 762
27 U:1174292.5:2001JAN12 70814001V1 939 1593
27 11:1174292.5:2001 JAN12 g746862 3364 3540
27 LI: 1174292.5:200 IJAN 12 6516637H1 68 608
27 LI:1174292.5:2001JAN12 4344244H1 1252 1532
27 LI: 1174292.5:2001 JAN 12 4384858H1 7 184
27 LI: 1174292.5:2001 JAN 12 1365434H1 2289 2535
27 Ll:1174292.5:2001 JAN12 5482659H1 17342004
27 U:l 174292.5:2001 JAN12 g3899860 2211 2617
27 LI:1174292.5:2001JAN12 5102048T6 2204 2326
27 LI: 1174292.5:2001 JAN 12 1752713H1 2457 2688
27 LI: 1174292.5:2001 JAN 12 2865004H1 72 380
27 Ll-.l 174292.5:2001 JAN12 70649482V1 1205 1839
27 LI-,1174292.5:2001 JAN12 6191116H1 2712 3038
27 U:1174292.5:2001JAN12 g5673730 2732 3079
27 LI: 1174292.5:2001 JAN 12 388169H1 3680 3970
27 LI: 1174292.5:2001 JAN 12 042436H1 2883 3106
27 LI:1174292.5:2001JAN12 4251908H1 2926 3179
27 LI:1174292.5:2001JAN12 71297258V1 30763182
27 LI: 1174292.5:200 IJAN 12 3488035H1 3112 3390
27 Ll:1174292.5:2001 JAN12 71297162V1 3116 3390
27 LI: 1174292.5:2001 JAN 12 70989354V1 3116 3390
27 LI: 1174292.5:2001 JAN 12 70991039V1 3116 3390
27 U:l 174292.5:2001 JAN12 g747217 3856 4045
27 LI: 1174292.5:2001 JAN 12 gl445214 38854067
27 U:1174292.5:2001JAN12 g714619 3321 3680
27 LI: 1174292.5:200 IJAN 12 5608250H1 3909 3961
219
1 m ND ND NO ND ND ND NJ NJ ND M ND NO M NJ NO ND ND NO NO ND NO ND M ND M ND M ND ND ND ND ND ND ND M ND NO ND ND ND ND NO ND NO ND vj sj sl sl sl vl vl vj sl sl sl sl sj sj sl sl sl sι s| s| sl sl sι sι si sj sj sl si sj sl vι s| sl sj si sl sj sj sl sl sl sj sl s sι si s| sl vl
O
i i
'
,
Figure imgf000221_0001
NO ND NO ND ND ND ND _■ _< ND ND ND ND NO ND NJ ND ND NO Co NO NJ ND ND ND ND _, CO ND ND ND _, , Co C co CO CO CO CO
CO CO ND 00 ND CO 00 o SJ ND _, ND
-O O O O NO vl o " o o VI vl -O Ol Ol cπ VI vl O- CO O o O D si i-. O ND
00 o oo VI D
C o o 00 -O 00 O 00 o co — « VJ — " vl o O- ND O fc — > ND 45* CO o 4-» o Ol o 00
- o 00 •fc. 00 vi ND Co -O si CO O O SI C ≥ •fc O O •
NO o -o O o ND D 5» 4-- ND O O 00 o -> o O vl 00
O cn
O Co NO VI 3 vj O O Oi 00 - cn O •fc. D 45> vl Oi CO CO VI VI CO Ol ND OS
Figure imgf000221_0002
TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
27 LI 1174292.5:2001 JAN12 319207H1 1881 2258
27 LI 1174292.5:2001 JAN 12 5899378H1 18922158
27 LI 1174292.5:2001 JAN 12 70815211 VI 19652551
27 LI 1174292.5:2001 JAN 12 60208741 UI 19782340
27 LI 1174292.5:2001 JAN 12 g6475480 23162617
27 LI 1174292.5:200 IJAN 12 g5673074 33383538
27 LI 1174292.5:200 IJAN 12 70812295V1 1120 1393
27 LI 1174292.5:200 IJAN 12 g2208504 1444 1861
27 LI 1174292.5:2001 JAN 12 1301303T6 1173 1784
27 LI 1174292,5:2001 JAN 12 70811699V 1 1455 1954
27 Li 1174292.5:2001JAN12 5728849H1 13541717
27 LI 1174292.5:2001 JAN12 1746188F6 18662381
27 LI 1174292.5:2001 JAN 12 6128031F6 20202593
27 LI 1174292,5:2001 JAN 12 5728849F6 13532069
27 .LI 1174292.5:2001 JAN 12 60215486U1 97 564
27 LI 1174292.5:2001 JAM 2 gl445215 33383404
27 LI 1174292.5:200 IJAN 12 60215481 UI 1050 1509
27 LI 1174292.5:2001 JAN 12 6396723F6 386 976
27 LI 1174292.5:2001JAN12 7928421 HI 22142613
27 LI 1174292.5:2001JAN12 g560245 29033075
27 LI 1174292.5:200 IJAN 12 1365434R6 22892615
27 LI 1174292.5:200 IJAN 12 1702474F6 386 650
27 LI 1174292.5:2001 JAN 12 4109619H1 38323960
27 LI 1174292.5:2001 JAN 12 g6574959 1498 1847
27 LI 1174292.5:200 IJAN 12 768224H1 1611 1853
27 LI 1174292.5:2001 JAN 12 g3919927 1576 1862
27 LI 11 4292,5:2001 JAN 12 1617118H1 18832084
27 LI 1174292,5:2001 JAN 12 g2595201 1686 1937
27 LI 1174292.5:200 IJAN 12 g3418456 1521 1855
27 LI 1174292.5:2001 JAM2 2790135H2 19342165
27 LI 11 4292.5:2001 JAN 12 2772125H1 1139 1391
27 LI 1174292.5:2001JAN12 2731430H1 1394 1629
27 LI 1174292.5:2001 JAM 2 3817084H1 1911 2178
27 LI 1174292.5:200 IJAN 12 g3298942 1530 1835
27 LI 1174292,5:2001 JAN 12 4125311 HI 20662333
27 LI 1174292.5:2001 JAN 12 60215482U1 938 1468
27 LI 1174292.5:2001 JAN 12 70812078V1 991 1587
27 LI 1174292.5:2001 JAN 12 70816015V1 1001 1641
27 LI 1174292.5:2001 JAM 2 70812080V1 1018 1527
27 LI 1174292.5:2001 JAN 12 70813268V1 927 1553
27 LI 1174292,5:2001 JAN 12 70816143V1 1092 1570
27 LI 1174292.5:2001 JAN 12 6729472H1 616 1207
27 LI 1174292.5:200 IJAN 12 g5152323 33373574
27 LI 1174292.5:2001 JAN 12 2844156H1 966 1241
27 LI 1174292.5:2001 JAN 12 5559861 HI 19002117
27 LI 1174292.5:200 IJAN 12 5102048F6 17382323
27 LI 1174292.5:200 IJAN 12 g2240918 17272076
27 LI 1174292.5.-2001JAN12 70813591VI 1051 1615
27 LI 1174292.5:2001JAN12 6631603R8 998 1491
27 LI 1174292.5:2001 JAN 12 4765479H1 54 306 TABLE 5
ID NO: Template ID Component IC Start Stop 7 LI: 1 174292.5:2001 JAN 12 5482373F6 1743 2340 7 U: 1 174292.5:200 IJAN 12 6128479H1 2143 2613 7 U:1 174292.5:2001JAN12 2861529H1 53 318 7 U:l 174292.5:2001 JAN12 5102048H1 1738 1987 7 LI: 1 174292,5:2001 JAM 2 6747886H1 2212 2407 7 11:1 174292.5:2001 JAM 2 g5878968 2163 2615 7 U: 1 174292.5:2001 JAN 12 7701481H1 1307 1889 7 LI: 1 174292,5:2001 JAN 12 g566560 3338 3654 7 LI: 1 174292.5:2001 JAN 12 3461329H1 277 389 7 LI: 1 174292.5:2001 JAN 12 70816833 V 1 315 898 7 LI: 1 174292.5:2001 JAN12 433634R6 384 784 7 LI: 1 174292.5:2001 JAM 2 60208742U1 925 1401
27 LI; 1 174292,5:2001 JAN 12 60215479U1 651 1094
27 LI: 1 174292.5:2001 JAN 12 4315517H1 787 1068 7 U:1 174292.5:2001JAM2 70816149V1 794 1420 7 LI: 1 174292.5:2001 JAN 12 3782084H1 817 1 132 7 LI: 1 174292.5:2001 JAN 12 7081 1782V1 916 1405
27 LI: 1 174292.5:2001 JAM 2 60203788U2 925 1401 7 11:1 174292.5:2001 JAM 2 6128891 F8 2143 2643 7 LI; 1 174292.5:2001 JAM 2 4384858F6 7 196
27 U:1 174292.5:2001JAN12 6082132F8 10 562
27 LI: 1 174292.5:2001 JAN 12 7363981 HI 13 574 7 Ll:l 174292.5:2001 JAN 12 60215484U1 78 548 7 LI: 1 174292.5:2001 JAN 12 70813546V1 123 762 7 U:1 174292.5:2001JAN12 70814644V1 218 864 7 U:1 174292.5:2001JAM2 2477153T6 247 577
27 LI:1 174292.5:2001JAN12 g2818305 2081 2437
27 LI:1 174292.5:2001JAN12 3592751 HI 1315 1636 7 LI: 1 174292.5:200 IJAN 12 433634R1 384 898 7 Ll:l 174292.5:2001 JAM2 g2599099 1684 1933
27 LI: 1 174292.5:2001 JAN 12 1550876T6 1862 2217
27 LI: 1 174292.5:2001 JAN 12 70814410V1 91 643
27 Ll-.l 174292.5:2001 JAN 12 g2384652 1 2293
27 LI: 1 1 4292.5:200 IJAN 12 3275520H1 1 267
27 11:1 174292.5:2001 JAM2 7081 1649V1 7 569
27 11:1 174292.5:2001 JAN12 60208699U1 2017 2372
27 LI: 1 174292.5:2001 JAN 12 gl 891978 1747 2137 7 LI: 1 174292,5:2001 JAN 12 1550876H1 1837 2043
27 U:l 174292.5:2001 JAM2 1545224H1 1315 1498
27 LI: 1 174292.5:2001 JAN 12 2906831 HI 1704 1836
27 LI:1 174292.5:2001JAN12 3364048H1 1584 1765
27 LI:1174292.5:2001JAN12 2945872H2 1247 1544
27 LI: 1 174292.5:2001 JAN 12 7254181 HI 884 1461
27 Ll.-l 174292.5:2001 JAN 12 g5513770 2188 2617
27 U:l 174292.5:2001 JAN12 779596H1 1056 1326
27 LI: 1 174292,5:2001 JAN 12 g664566 1506 1802
27 LI: 1 174292,5:2001 JAN 12 7943284J2 1497 1841
27 Ll.-l l 74292.5:2001 JAN12 glόlό477 1678 I860
27 LI: 1 174292.5:2001 JAN 12 6082132H1 20 407
27 LI: 1 174292.5:2001 JAN 12 3021414H1 2182 2377 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
27 LI 1 174292.5:2001 JAM 2 3254104H1 1004 1277
27 LI 1 174292,5:2001 JAN 12 g4329312 1394 1859
27 LI 1174292,5:2001 JAN 12 234811OH1 23992615
27 LI 1174292.5:2001 JAN 12 70814746V1 1226 1834
27 LI 1 174292,5:200 IJAN 12 755739H1 1448 1674
27 LI 1174292.5:2001 JAN 12 5732209H1 1270 1535
27 LI 1 174292,5:2001 JAN 12 7584421 HI 1245 1737
27 LI 1174292.5:2001 JAN 12 g5548346 22462617
27 LI 1 174292,5:2001 JAN 12 5689379H1 1614 1715
27 LI 1 174292.5:200 IJAN 12 glόlό47ό 7 78855 1181
27 .LI 1174292.5:2001 JAM 2 gό64580 11441166 1705
27 LI 1174292.5:2001 JAN 12 1360406T6 1290 1760
27 LI 1 174292.5:2001 JAN 12 4514655H1 294 381
27 LI 1174292.5:2001JAN12 1702474H1 3 38866 498
27 LI 1174292,5:2001JAN12 6074842H1 11552200 1813
27 LI 1 174292.5:2001 JAM 2 6041309H1 117711442123
27 LI 1 174292.5:2001 JAN12 gl 891852 220044222437
27 LI 1174292,5:200 IJAN 12 5482373H1 17342021
27 LI 1 174292.5:200 IJAN 12 70649463V1 1148 1652
27 LI 1174292.5:2001 JAN 12 334466H1 1881 2130
27 LI 1174292.5:2001 JAM 2 2764772H1 1399 1641
27 LI 1174292,5:2001 JAN 12 5899894H1 19162173
27 LI 1 174292.5:2001 JAM 2 7244360H1 20892437
27 LI 1174292.5:200 IJAN 12 6128479F6 21432707
27 LI 1174292.5:2001 JAN12 3716538H1 6 67744 909
27 LI 1 174292.5:2001 JAN 12 4774519H1 11660099 1880
27 LI 1 174292.5:200 IJAN 12 5589184H1 3399 184
27 LI 1 174292.5:2001 JAM 2 4913344H1 11666699 1835
27 LI 1 174292.5:2001 JAN 12 575477H1 11557788 1835
27 LI 1 174292.5:2001 JAN 12 4597333H1 11554466 1800
27 LI 1 174292,5:200 IJAN 12 70813490V1 669 1202
27 LI 1 174292.5:2001 JAN 12 7701481J1 19782604
27 LI 1174292,5:200 IJAN 12 755739R6 1 1444488 1831
27 LI 1174292.5:200 IJAN 12 g678136 333333553663
27 LI 1 174292.5:200 IJAN 12 g5151992 11660022 1864
27 LI 1174292.5:2001 JAN12 70814293V1 779 1402
27 LI 1174292.5:2001 JAN 12 g4664424 21952615
27 LI 1174292.5:2001 JAN 12 g3837532 2 21111332437
27 LI 1174292.5:200 IJAN 12 g564175 119933222171
27 LI 1 174292.5.-2001JAN12 5709938H1 119900992174
27 LI 1 174292.5:200 IJAN 12 6128891F6 21452707
27 LI 1174292.5:2001 JAN 12 g879187 1531 1840
27 LI 1174292.5:200 IJAN 12 70812860V1 2123 2615
27 LI 1 174292.5:200 IJAN 12 3488035F6 31223390
27 LI 1 174292.5:2001 JAM 2 3183652H1 6 6 324
27 LI 1 174292.5:200 IJAN 12 2764773H1 11339999 1642
27 LI 1 174292.5:2001 JAN 12 2428891 HI 11559999 1835
27 LI 1174292.5:200 IJAN 12 3397356H1 17852025
27 LI 1 174292.5:2001 JAN 12 70816599V1 1156 1685
27 LI 1174292.5:2001 JAN12 1550876R6 18382261 CD m ©
C M ND ND ND ND W ND NO NO ND NO NO ND ND W ND M ND M ND ND M NO NO NJ NO NJ NJ M ND NJ NO NO W NO M NO NO M NJ NJ NJ NJ NJ NJ NJ M 0 ) -0 D -0 -0 ) -0 ) D OO CO OO CO C» OO CX3 θO OO C» OO OO CO CO CX3 00 00 C» OO OO CX3 00 00 00 CX3 00 00 D Z
O
NJ NO ND NO ND ND NO NO ND NO — < —. —• — . _. _. —. — . _. _. _ . _, _ ■ _. _, _, _, _, _, _, _, __, _. _. _, _■ _■ _, __ , _■ _ , _ _, _ _ _, _, _ _, _ , 0 -- -J — . _. _. -a — . _. _. _. _. _. _. —. _ _ , _. _, _. _. _. _. _, _. _, _ . _. _. _. _. _. _. _. _■ _. _. _ . _, _. _ , ^ _- ^ _ , _J _J _ , _, _ 4-* N0 ND ND ND ND N0 ND NJ ND vl sl sI sl sl sI sl sI s| sj sl sl sl sI sl sl sj sI sl sl sl sl sj sl sl sj sl sl sl sI sl sl sI sl sl sl sI sl sI sl
NJ O O O O O O O O O -' — . —. —< -» — . —. _- _. _ . _- _. _J _. _i _J _. _. _χ — . _. _. _. _. _, _. -- _- _. _- _. _- _ ' ND ND N NO ND ND ND Φ ND ND M NJ ND ND N0 N0 N0 sl sl sj sl sl sl sJ sJ sl sJ sl sJ sj sj sl sl s| sI sl s] sj sl s] sj sl sl sl sl s| sl sl sJ sI D -O - -O <D -O D -
45* Ol Ol Ol Ol Ol Ol Ol Ol Ol 03 C G CO C3 C G CO G GO G G GO O-> GO CO G CO GO CO GO G C GO CO G OO CO W J L-, l-. l-ι L-, L-, ^ - L-. L-, l-, -, L-, L-J l-, -. l-. l-, ^ l- l-. l-. ^ l-, - l-, -J l-J -. L- l-, lJ -. l-, ^ Kb Kb Kb K K Kb Kb Kb Kb 'Nb Kb Kb Kb Kb kb Kb Kb Kb Nb Kb Kb K^
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O ffi 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 _ ) ! (-_ C_ C_ _ C_ C_ C_ _ _ C_ C_ C_ C_ C_ _ C_ C_ C_ C_ C_ C_ _ C_ C_ C_ _ C_ _ C_ C_ C_ _ C_ C_ C_ ^; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ND ND ND ND ND NJ NJ ND NJ NJ NJ I W NJ NJ NJ NJ NJ ND O ND ND O ND W ND ND W ND O ND ND ND ND ND W ND N^
Figure imgf000225_0001
45* ND O -. O OO ι C ND Gθ O Ol GO ND GO ND — . — — . — ' NJ — ' NJ NJ NJ — ' — ' — ' ND — J —' OD O i Oo p-J - -O OO O O- 45* 45* l NO O h COo CsOl GOo O—i ■ OGol NDi s NDl G VOI - 0fc0 Ol 0 _ 0 _ mS .o VO_jι O^Jl M) I_0 α_ji O -O O i CO O NO ND — ' 45* 45* Oι - Ol co oo N 0 ND NJ — ' O 00 ND vl 45* O Go- 45* OO C O Co oo 4-* vi — ' Oi O CO Co i ro oi ^ OJ * 00 0 00 O 45* 0 0 -0 0 — ' 0 0 0 45* O- ND vl 00 Ol O- ND -O oo cn -o s cn o -o -o —' * o c Go cn O 00 si D J
co m
D
C C0 03 G CO G GO OO G C GO C0 03 CO CO C CO OO CO GO CO G OJ CO C 03 GO C0 03 G G CO OJ GO C0 03 CO O^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ _-J ^ ^ -^ --_ ^ ^ ^ ^ _1 _. _. _- _. _. _-. _, _, _. _, _. _J _, --J _J _-, _, _. _, _, _. _. O O O O U o
Figure imgf000226_0001
CO sl N0fn 00 1 ft i $ N m -2 x
Figure imgf000226_0002
M M M -' -^ -' -J --' ^ ^ -' -' -' J -' -J -J J -' -J -' -J --' -' -' -' -' -' -' --' -' -' --' -J -' --' --' -' -' -' -' --' M M M M _, _, CO
-' O O -O -O O O -O CD CO CO rø OO sl M M sl sl M M M M M M M M M M sl M sl M O J- J- CO -' -' -' -' -' -' -' -' -O n ^, ^ ^' O M O M O O O O M Cn N) -' -' <) (» K (» CO (» (» 00 00 03 C10 03 (10 03 CO O) CO (» 0! a M O B M <) O M (> (il U M N) -' C y n U o ω oι oι » ϋι θι w ) ω ω M M j^ ω ω ω ) ω ω ω co ω ω o ω ω ω « ω ω w o o σ) θ3 (» ^ o αι M -o ) o π <—> -j.
ND ND ND ND NO NO NO NO — ' NO — ' — ' ND NO ND ND ND ND NO ND ND — ' ND ND NO NO ND ND ND NO — ' NO ND ND — ' — . —. — _ _ —. — ' K) M S3 K) . ., ., ,, (Λ ND — ' 45* O ND ND N0 N0 O ND -O -O O O ND N0 ND N0 ND ND 45* -O O 45* fc. ND — ' 45* 45* O -0 45* W ND vI -0 O <D C 03 45* 45* 45* 45* 4 45* P lir 'i ? CO ϋi O- NO — ■ -< — - O ND — O Oo O vl — . — . — . — ' > — ' Oi O- 45* ND Co O O O GO OO O ND — < — ' O O O O M M C J- O M K oo r S g s ϋ
^ ω o ^ m ∞ c» M ∞ -' σ -' n O ϋo ∞ ) ^ (B ω o ij oι t jι ω w n o ) σ) ∞ jι <) θ M -j θ -o ω j5. <) & ^ w u,O
Figure imgf000226_0003
m © z o
sl sl sl sl sl sl si si sl sl sl . sj . sj _ sj sl sl sl sl sl sl sl sl sl sl vl SJ sl . sl. sl. si. sl. sl. sl sj sl sl sj sl sl sl sl sl sl sj sl sl sj sl sl sl
Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Ol Ol Oi Oi Oi Oi Oi Oi Oi Ol Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Ol Oi Oi Oi Oi Oi Oi Oi Oi ϋl Oi Oi Oi Oi Cn Oi Oi Oi Oi Oi
Oi Oi O " i O " i O " i Oi O " i O " i Oi Oi Oi Oi Oi σi Oi Oi Oi Oi Oi ϋi Oi Oi Oi Oi Oi ϋi Oi Oi Oi Ol Oi Oi Oi Oi Oi Oi Oi Oi Ol Oi Oi Oi Oi Cn σi Oi Oi Oi Oi
45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* 45* -fc 45* 45* 45* 45* 4i* 45* 45* 45* 45* -fc 45* 45* 45* 45* 45* 4^ 45* 45* 45* 45* 45* 45* 45* 45* 45* Φ
Kb Kb Kb Kb Kb Kb 'Nb N Kb Kb Nb Kb K Kb Kb Kb Kb Kb Kb 'Nb o O oO oO oO oO oO oO oO oO oO oO oO oO oO oO oO oO oO oO oO oO oO OoOoOoOo oO Oo oO Oo oO OoOoOoOo oO OoOoOo oO OoOoOoOoOoOo oO oO oO oO →Φ-
C_ _ C— c_ <-_ C_ <__ <__ C—
>> > >>>> >> > > > > > > > > > > > > > > > > > >> >>> >> > > >> > > > > > > > > > > > α zzzzzzzzzzz> >zz z z z z z zz zzzzz Z Z ZZ Z Z Z Z Z Z Z
ND NO NO NO W NJ M NJ NJ NJ NJ NJ M NJ NJ M W NJ NJ ND M NJ NJ NJ NJ NJ NJ NJ NO M ND ND NO ND ND O ND NO NO NO ND NO NJ NJ M NJ
Figure imgf000227_0001
hO ND ND NJ ND ND NJ ND NJ ND NJ ND ND ND ND ND NJ ND — ' — • — — ' ^-i ^n i-n O- O- Oι 45* O3 Cθ Co C G0 Cθ Cθ Co ND ND ND 00 si VI vj CD- o o rn rn 45* 45* 45* (0 ND ND ND D -0 — O vl vl O- Ol ND ND ND O OO OO OO • vl GO -O OO Vl — i O O O CD"" Ol O- ND NJ o- cn -0 cn rn cπ OO si cn
-O o O O CO 45* Ol — • vl -O 45* -00O --0 o ND NJ o- vi -O U1 ND C0 45* — O M 4- 0 >0 0 -0 0 -0 45* 00 si n 45* — • NJ 45* Oi Oi Oi 45* CO co ° sj - O- Oo vl Q -4.
ND ND ND ND NJ NJ ND NO NJ NJ NJ NO ND NO ND ND NJ NO — . —. —. —. —. —. — 1 — ' si — ' Cn <"n ND ND CO D vl sl si o O- 45* 45* 45* 4-* CX> 45* 45* 45* 45* 45* vl NO CO ND NO C0 0 45* C — . O — ' — — O 00 00 O ^ oa g a rr.S ι"π —^ ' —t ' — ' — ' — ' ND — ' O O O- CO O- Oι .fc. CD- Ol CD- 0 0- Ol sI I-. ND ND OO O ND 00 45* — ' 00 O- NO 00 O O O si O- 00 CO i NO Oi C i ui ( w_4 5* O — ' O — ' O C Go 45* Oo co θ sj sl Co co 45* vl ND ND C Co O vl D O O VI CO Ol 00 CO O D cn Oi M w ω co M 4g ^D 0g0 ω—' —s0' <) COO
Figure imgf000227_0002
TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
31 11:758541 .1 200 IJAN 12 38121 14T6 2723 2881
31 11:758541.1 200 IJAN 12 3838635H1 2845 2927
31 Ll:758541.1 2001 JAN 12 g2841729 2848 2932
31 LI.-758541.1 200 IJAN 12 2719179F6 1 402
31 U:758541 .1 200 IJAN 12 6783407H2 86 659
31 LI.-758541.1 200 IJAN 12 3520795H1 123 433
31 Ll:758541 ,l 2001 JAN 12 71 19178F8 145 377
31 Ll:758541.1 200 IJAN 12 71 19178H1 145 377
31 11:758541.1 200 IJAN 12 5208572F6 154 755
31 U.758541.1 2001 JAN 12 5208572H1 154 391
31 LI: 758541 .1 200 IJAN 12 6513345H1 217 757
31 LI: 758541.1 200 IJAN 12 6513345F7 217 819
32 LI: 137815.1 2001 JAN 12 71261842V1 2415 2921
32 LM 37815.1 2001 JAN 12 6332261 HI 3662 3993
32 LI: 137815.1 2001 JAN 12 71 105313V1 3684 3990
32 LI: 137815.1 200 IJAN 12 71 105788V1 3686 4200
32 LI: 137815.1 200 IJAN 12 g 1545726 3690 3991
32 LI: 137815.1 2001 JAN 12 71 105931V1 3704 3990
32 LI: 137815.1 200 IJAN 12 71261065V1 3736 4245
32 LI: 137815.1 200 IJAN 12 71 106885V 1 3939 4395
32 LI: 137815.1 200 IJAN 12 71260985V1 4164 4695
32 LI: 137815.1 200 IJAN 12 7120630H1 4173 4503
32 LI: 137815.1 200 IJAN 12 71 105992V1 4328 4931
32 Ll:137815.1 200 IJAN 12 71 105586V1 4336 4588
32 LI: 137815.1 200 IJAN 12 1865354H1 3506 3778
32 LI: 137815.1 2001 JAN 12 7651002J1 3546 3955
32 Ll:137815.1 200 IJAN 12 71 105703V1 3660 3990
32 LI: 137815.1 200 IJAN 12 3649382T6 3065 3607
32 Li:137815.1 200 IJAN 12 6485489R9 31 17 3664
32 11:137815.1 2001 JAN 12 1432090R7 3125 3626
32 LI: 137815.1 2001 JAN 12 1432090H1 3125 3357
32 LM37815.1 200 IJAN 12 7258715T6 3158 3439
32 11:137815.1 200 IJAN 12 g2433198 3157 3364
32 LI: 137815.1 200 IJAN 12 g3778214 3168 3625
32 LI: 137815.1 2001 JAN 12 1432090T6 3193 3590
32 Ll;137815.1 200 IJAN 12 g769959 3300 3561
32 LI: 137815.1 200 IJAN 12 71106233V1 3369 3880
32 LI: 137815.1 2001 JAN 12 71 107283V1 3403 3925
32 LI: 137815.1 200 IJAN 12 g 1764406 3429 3732
32 Ll:137815.1 200 IJAN 12 5761763H1 3460 3579
32 LI: 137815.1 200 IJAN 12 71 107446V1 3465 3996
32 LI: 137815.1 200 IJAN 12 71 106259V1 3487 3990
32 11:137815.1 200 IJAN 12 71107003V1 2863 3421
32 Ll:137815.1 200 IJAN 12 71261315V1 2890 3413
32 LI: 137815.1 200 IJAN 12 71 107304V1 2916 3360
32 LI: 137815.1 200 IJAN 12 71260760V1 2913 3461
32 U:137815.1 200 IJAN 12 6916445H1 2922 3432
32 Ll:137815.1 200 IJAN 12 8269736U1 3033 3459
32 Ll:137815.1 200 IJAN 12 g4891355 4388 4848
32 LI: 137815.1 200 IJAN 12 1889561 Fό 4544 4960 CO m SD
G0 C0 C0 03 G0 C0 G0 G C0 C0 G C0 G0 G0 C0 C0 0J C0 C0 G0 G0 00 C0 03 Cri 03 0J C0 03 03 G0 C0 O G0 G^ O NJ NJ ND ND W ND ND ND ND I ND ND NJ NJ NJ ND ND ND ND ND ND ND ND ND ND ND ND ND O ND ND ND ND I M NJ I N^
Figure imgf000229_0001
ND ND NO ND ND ND ND ND ND ND ND NO 45* 45* —
CO Co Go Go ND . Oo 00 00 00 ND — ' ND ND NJ fc fc fc fc 45* 45* fc 45* fc C/3 Ol Ol Ol cn 45* ND GO GO O O -O 00 00 CO GO GO GO O O Ol Oi Oi 45* 45* O O sj -0 -O Co ND ND CO 00 I
ND 45* — ' - ND 00 O O O O 0 CX3 O 45* si Ol Oi Go ND si si si CO sO fc — ' O 00 fc cn Q to Co fc — ' Ol 00 D NJ NO O O O- O O O NO G si si Ol 00 Ol 45* 45* fc O Go ND -O O fc =S- co Go CO C GO GO GO CO CO
45* C NJ Co O - -> ND ND ND ND -• 45* 45* 45* 45* 45* ND ND ND NO NJ ND NJ ND NJ ND fc fc 45* fc fc fc fc 45* fc C3 CO — 45* vj — ■ Vjς 45* ND 03 -' co 45* CO CO NJ ND O O O vl (» si 00 00 CO fc O- CO O 45* ND 45* 45* 45* Go O sj 00 o n co o OO -O -O -O 00 00 -O vj 0 G ND — ND Ol -O § O ND — ' -o o si Co - 45* — (0 VI NO 0 GO SI ND CO ND O 00 GO -O Ol vi -O CO o cn o Oi Oi O si - Ol — < vl O — ' SI 00 — ' to O -O O 5* CO -O O O si CH Ol vl NJ Ol 00 ■ o 0 N0D r Ot -O sj si cn o vi — vl vl Ol O — ■ o O T3
Figure imgf000229_0002
CO m 10 fc fc fc fc Go Go GO W Co Go Go Co ω CO Oo Co GO Go CO CO C Go Co GO CO Co Co GO CO CO Go GO Co co os ω Cjo ω W
Figure imgf000230_0001
0 O 00 _ 00 00 CO 00 00 00 vj sj sl Sl sl sl sl vl si VI O CO CO CO OJ ND -- GO CO ?
—' o vi ., o cn co CO ND — O 00 ND ND —' — —■ — O O vj —i 00 00 CO oo oo 00 00 00 f c
03 00 CO 00 o fc G _, c sl f sl vj si O CO Q
01 O 00 __ —■ 00 vl vj O NO — ' —. —. —- 00 fc fc fc O fc o o CoO o ND ofc O vj O o o O O O o o O O CO vl 00 — > sj vi sj
fc fc fc Oi O O fc O OO O OO fc ND CO
O — ' NO ND ND ND NO — ' NO — — oo O O O O OO O
ND Co fc fc fc Oi fc fc fc fc fc fc fc fc fc -^" -1^ -" ' ~~"
O 00 00 O — — — ' O — ' Sl NO v ■l o fc 00 o o co
ND — — : N :D Cή . vj : :N;D ω: :N:D C_Jl. .NO_ ώ_ .N-D O_ .N-O .N-D .N-D .N-O .N.O O_ . - .ND. .ND_ ^ ^ ND CO O
00 S1 — ' Sl O sl O vl vQ ND Ol o jβ ^ ϊ
CO Cn Ol Ol vJ GO O Go O vJ O ND W O fc fc GO Oi CO Oi Co Co GO GO fc O Go GO O fc -. fc fc ° -1 3
Figure imgf000230_0002
m D
Cn Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Ol Oi Oi ϋi Oi Oi ϋi Oi Oi Cn Oi Oi Oi Oi Oi Oi Oi Cn Oi ϋi Oi fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc O
O
fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc r ND ND ND ND NO ND ND NJ NJ ND NO NJ NO NO NJ NJ NJ O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O CO GO CO GO GO CO CO GO GO GO GO CO CO CO GO CO CO CO O O O O O O O O O O O O O O O O σ O O O O O O O O O O O O O O O NO ND ND NO ND ND ND NO ND NO ND ND ND ND ND ND NO ND
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Oi Oi Oi Oi Oi ϋi Oi Oi Oi Oi Oi ϋi Oi Oi Oi Oi Oi Oi
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 03 i M M r i i i M M M i i M 'M Nj io i t i i M M O
Nb Nb KJ Nb Kb Kb Kb N N K Nb Nb Kb Kb Nb Nb Kb Kb Nb Kb Kb w Q
O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O →- O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O Φ _ _ <-_ C_ C_ C_ _ -- _ C_ -. C_ C_ -. (-_ _ C_ _ (--. C_ l__ _ C_ _ C_ C-. l-_ -. C_ C_ C_ _ _ l-^
© >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> σ zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
M M W W W W M M M M M M W M M I M M W W ND I W M M M I 1 M M W N3 U M M M W W
o OoOOoOOoOOoCOoOOoOOoOOoOOoCOovlosl
Figure imgf000231_0001
O O O O O O O Oi NJ O O Go
O CO 03 00 Co ,-. 00 _. O r Co vl Oi Go si Go oo > NO Co 1-0 NO , ,-. ,-, ,n , .. f . -s N, ,-, si CO 00 fc si ND GO OO Nj CJ iv — ' ^ Nj C ^ ^ o fc sJ fc fc C vl fc NJ o S Cn Cn gi CO W NJ ND p o O ND O C si CO CO CO CO C co
GO fc vl Co fc sl fc fc Co vj fc ND O j O VI sl
O vl l ND ^ N^D ^ NJ CDD - P g vJ 0 ^ -si t O I C 4 O O J -- 03 ^ OO ^fc Qfc OO O VI fc CO s g g ^O ^OO ^sI VG-o ^O ^ fc Ol O Oi 0
O — 00 sl O O vl VJ "O
Figure imgf000231_0002
m Cn Ol Oi Oi Oi Oi Ol Oi Ol Oi Oi Oi Oi Oi Ol Oi Ol Oi Oi Oi Oi Ol Ol Oi Oi Oi Oi Cn Cn Oi Ol Oi Oi Oi Oi Ol Cn Cn Oi Oi Oi Ol Ol Oi Ol Oi Oi Ol Oi Ol O o
fc fc fc fc fc fc 45* fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 45* fc fc fc fc fc fc fc o o O O o O O O o o o O o o o O O O o o O o O o o O o o O O O o O o O o O o o o o o o O o o O O O o o o o o o o O o o o O o o o o O o o O O o o o o O o O o o o o — 1 o o O o o O o O o o o o o o o o O o o o o o o o o O o o O O o o o o O o o o o o o o o o o o O o o o o o o o o o O o o o o o o o o o o o o o o o o o O o o o o o o 3
ND ND NO NJ NJ NJ NJ NJ NJ NO NO ND ND NO ND ND NO ND ND ND ND NJ NJ NJ NJ NO ND NO ND NJ NJ NJ NJ NJ NJ NJ ND NO NO ND ND NO ND NJ NJ NJ NJ NJ NJ ND O
NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ NJ Q o O o O O O o O O O o O O O o O O o o o O O o o O O O o O O O O o o O O O o O O O O O O o o o O O O O O O O cϊf
O
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > zzzzzzzzzzzzzzzzzzzz zzzzzzzzz zzzzzzzzzzzzzzzzzzzzz
NJ NJ O NJ O NJ NJ NJ M NJ ND NJ NJ NJ ND NJ O NJ I ND IO NJ ND W NJ ND ND NJ NJ NJ NJ ND ND N^ c
r Sn m θ
≥ i — '
Figure imgf000232_0001
! Z! --_ --^ _-^ I-^ -_^ -I_ --- --^ --_ o r^ π o ro ro π
32S3≥s4fc33gg§§g!!§§§§§ § fc cn cn fc cn ND Cn cn cn cn cn cn cn n co
O fc fc O 0i fc fc Oi fc Oi fc fc fc Oi 0i O 03 fc G0 fc — ' Go O Go ND fc O Oi ND fc O Oi O NO O Oi rS ND — ' C C ^ O iS — ' rn C- K r^ r^-S ND O O I-O O OO Oo OO OO fc OO OO sl O O vi O sJ O O O fc — ' CO sl - ' O fc fc O OO Cn O O Co O <-° O O sl — ' u, (jι O lB 0 U ωO
Figure imgf000232_0002
m 0
O O O O O O O O O O O O O O O O O O O Oi Ol Oi Oi Oi Oi Oi Ol Ol Ol Oi ϋi Ol Oi Oi Cn Oi Oi Ol Oi Ol Ol Oi Oi Oi Oi Oi Oi Oi Oi Oi o
G3 G G0 C G3 C G0 G0 Co C*3 3 C G0 C C3 O3 C C C3 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc NoD NoJ NoO NoD NoD N N N N O O O O O O O O O O O O O O O O O soD N ND l soO losjoD N sl soO N N loD sl soD N NO N ND loO slosloD slosloO N sloO sloD O O O O O O s o O O O O O O O O
Sl sl sl sl si lo— .o— . _.o_.o_-o_.o_ .o_.o_o_..o_.o_.o_ .o_.o_.o__ .o_,o_.o_ .o_,o_.o_,o_ .o_,o_.o_ ■o — .o — .o — ■o — ■o — ■_ ri, si sj sl sl s| sl sj sl sj sl sl sl sj sl sl vl sj sj sl 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5! O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O 3
-' -' -' '-' r' t ^ -' -' -' -' -' -' -' - . L-. L-. L-. i_ KD KO KD KO NO KO KD KD KD KD KD KD KD NO ND KD KD KD ND KJ KJ NJ KJ KJ KJ NJ KJ KJ KJ KJ O Kb Kb Kb Kb Nb Kb Kb Nb Kb Kb Kb Nb Kb Kb Kb Nb Kb Kb Kb Kb Kb Kb o OoOOoOoOoOooOoOOoOoOoOoOoOoOoOoOoOoOoOoOoOggOOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOooOoOOoΦ→- to __ _ <L_ _ _ l-_ l__ l__ C_ _ C_ C_- -_ -- _ _ _ _ l_-. C_ C-. l-_ C_ _ _ C_ t- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
NO
ND ^ύ ^D ^D ^O ^J D ND N0 NJ ^O ^0 ND ^ J NJ NJ NJ ^ D ^D NJ ^D I\D ND ^D ND ^ ND ^D ND ^0 N
VI vi vl VI VI vl sl sl vj vl sl VI vj sl vl vl sl sl
O 0 O CO O O O O O — • lo O -' O O O O O Go ND
0- fc 0 O O vl O O O O O O O O 45* -n
( ) vl cn -0 O O Ol O O O — ' 00 0 00 0 vj 0 00 VI fc vj fc 0 N) NO — ' CO Co O fc υυ OO CO — ' Ol O O O v|
00 00 fc 45* 00 vl fc 00 O fc O OJ NO VI sj O NO O O
-0 00 r "> 0 O 00 O O O O GO uυ NJ vl sl Ol 00 sj cn O O no 0 (0 ND O 00 03 ND — ' O CO fc GO O ND — ' ro
< < < < < < < < < < < O < < < < < < < X τ o_3-!
Figure imgf000233_0001
ND c Ol GO cf rn O Ol Ol — ' NJ CO 0 0 0 0 0
— ' ND — ' — ' Ol fc Go 00 cn fc O O — Ol fc CO Co CO NJ 0 0
0 Ol ND f — ■ — ' o o rQ ND ND NJ NJ NJ NJ ND ND ND ND NJ ND — ' — ; —> —. —. — — ;_}- O fc fc fc -- — ' — ' — 0 <n
O 00 VI Ol 00 NJ l NJ 00
ND O O Ol O 00 ND O O Ol CO CO o o y sl O O
^ Oi OO OO Oi O OO OO Co CO ND NJ O Cn Oi Ol fc O OO O ^!- l 3 _, _J _, _, _, _, f |sj 3 t\D _, _, _. _ _, _. _. _J _. _. _, _ _J _, _, _J _, _. _. _, _, _. _J
Ol Ol Ol Ol Ol G? O O OO Oo CO O O O O O O O O vl oo ND Ol ND — ' — . — . Ol C -— — ' — ' Go Oi Ol Ol Ol Ol fc Ol Co fc fc Oi fc fc fc Ol Ol Ol Ol Ol ^ fc fcO O fc fc fc CO NJ O VI — ' O SI -' M M Co J- O O M O O M O CD I- sJ CB fc i- O J- O U Ji. fc CO O - ' Oi OO O ND sl CO O Oi Go fc o O sl _- co sJ Ol O NJ — ' Oi O fc O OO OO NJ ND — ' O O O OO CO CO — < Ol O — ■ -O — ' CO fc Go NJ — ' 00 O O O 3
TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
36 11:329770.1:2001 JAN 12 70607866V1 407 504
36 11:329770.1 :2001JAN12 70604785V1 211 499
36 11:329770.1:2001 JAN 12 70688331 VI 494
36 11:329770.1 :2001JAN12 70960122 V 1 481
36 Ll:329770.1:2001 JAN 12 70685859V1 418
36 LI:329770,1:2001JAN12 2918883F6 409
36 LI :329770.1:2001 JAN 12 7594768H1 356
36 Ll:329770.1:2001 JAN 12 70089283V1 335
36 11:329770.1:2001 JAN 12 2918883H1 251
36 Ll:329770.1:2001 JAN 12 70692597V1 350 1001
36 U:329770.1:2001 JAN 12 70607432 V 1 596 982
36 11:329770.1:200 IJAN 12 70694285V 1 809 939
36 LI:329770.1:2001JAN12 70691777V1 350 900
36 LI:329770.1:2001JAN12 70533480V1 809 901
36 LI:329770.1:2001JAN12 70686916V1 293 878
36 LI .-329770, 1:200 IJAN 12 70453719V1 200 854
36 11:329770.1:2001 JAN 12 70687283V1 1 540
36 11:329770.1:200 IJAN 12 70695778V1 197 504
36 Ll:329770, 1:2001 JAN 12 70687814V1 1 504
36 LI :329770.1:2001 JAN 12 70694491 VI 809 1030
36 U:329770.1:200 IJAN 12 70091532V1 440 1028
36 U.329770.1:200 IJAN 12 70647371 VI 354 1022
36 LL329770.1:2001 JAN 12 70453161 VI 386 1067
36 11:329770.1:2001 JAN 12 7059435H1 809 1057
36 Ll:329770.1:2001 JAN 12 70452814V1 809 1049
36 11:329770.1:200 IJAN 12 70457491 VI 809 1032
36 LI:329770.1:2001JAN12 71684873V1 771 1527
36 LI :329770.1:200 IJAN 12 70687095V 1 882 1476
36 Ll:329770.1:2001 JAN 12 70092770V1 1038 1393
36 LI :329770.1:2001 JAN 12 70755190V1 915 1371
36 11:329770.1:2001 JAN 12 70091001 VI 1134 1645
36 11:329770.1:2001 JAN 12 70675684V 1 1011 1642
36 LI:329770.1:2001JAN12 70688900V1 1225 1572
36 U:329770.1:2001JAN12 71535201V1 809 1518
36 Ll:329770.1:2001 JAN 12 70689505V1 969 1575
36 Ll:329770.1:2001 JAN 12 70688828 V 1 962 1575
36 11:329770.1:2001 JAN 12 70689821 VI 969 1575
36 LI:329770.1:2001JAN12 70688770V1 969 1572
36 LI .-329770.1:200 IJAN 12 70607056V 1 809 1099
36 11:329770.1:2001 JAN12 70457254V1 809 1088
36 11:329770.1:2001 JAN 12 70683894V1 809 1156
36 LL329770.1:2001 JAN 12 70694316V1 809 1155
36 11:329770.1:2001 JAN 12 70685541 VI 809 1146
36 Ll:329770.1:200 IJAN 12 70655012V1 809 1153
36 LI:329770.1:2001JAN12 70960359V1 809 1151
36 11:329770.1.-2001JAN 12 70801401 VI 809 1150
36 LI:329770.1:2001JAN12 71533526V1 430 1145
36 LI:329770.1:2001JAN12 70605038V1 809 1103
36 U:329770.1:2001 JAN 12 70608146V1 868 1098
36 LI:329770.1:2001JAN12 70690952 V 1 809 1099 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
36 11:329770.1 :2001 JAN 12 70751801 VI 809 1335
36 11:329770.1 :2001JAN12 70606568V 1 809 1273
36 U-.329770.1 :200 IJAN 12 70604878V1 809 1271
36 U:329770.1 :2001JAN12 70802844V 1 809 1272
36 LI :329770, 1 :200 IJAN 12 70455619V1 809 1246
36 Ll:329770.1 :2001 JAN 12 70456834V1 809 1241
36 Ll:329770, 1 :2001 JAN 12 70960717V1 809 1236
36 11:329770.1 :200 IJAN 12 706891 12V1 809 1227
36 11:329770.1 :200 IJAN 12 70688995 V 1 809 1219
36 11:329770.1 :2001 JAN 12 70607882V1 809 1215
36 U.329770.1 :2001 JAN 12 70454278V1 809 1166
36 11:329770.1 :2001JAN12 70606220V1 846 1 152
36 11:329770.1 :2001JAN12 70754576V1 868 1364
36 LI:329770.1 :2001JAN12 2428620H1 1 101 1347
36 .1:329770.1 :2001 JAN12 70454354V1 809 1084
36 11:329770.1 :2001JAN12 70695646V1 809 1085
36 U.329770.1 :2001 JAN 12 70457155V1 809 1073
36 Ll:329770.1 :2001 JAN 12 70647347V1 809 1063
36 U:329770.1 :200 IJAN 12 70692842V1 809 1068
37 U.-898841.9:2001 JAN12 7192534H1 331 915
37 U:898841.9:2001 JAN 12 8194961J1 1 762
37 Ll:898841.9:2001 JAN 12 6817573H1 269 749
37 Ll:898841.9:2001 JAN 12 6817573F8 381 721
37 Ll:898841.9:2001 JAN12 6817573F6 165 721
38 LI:1 183848.3:2001JAN12 7020978H1 1 498
39 U:2037121.1 :2001JAN12 7989007H1 78 644
39 LI:2037121.1 :2001JAN12 7446676T1 122 588
39 LI:2037121.1 :2001JAN12 7413178H1 1 19 748
39 U:2037121.1 :2001JAN12 8134194H1 138 735
39 U:2037121.1 :200 IJAN 12 72036570V1 153 406
39 LI :2037121.1. -200 IJAN 12 g8007601 162 642
39 LI:2037121.1 :2001JAN12 72126488V1 246 421
39 U:2037121.1 :2001JAN12 g8358616 253 640
39 U:2037121.1 :2001JAN12 72033882V1 266 647
39 Ll:2037121.1 :200 IJAN 12 71470719V1 312 369
39 U:2037121.1 :2001JAN12 72032618V1 648
39 U:2037121.1 :2001JAN12 71537246V1 641
39 U:2037121.1 :2001JAN12 72033783V1 646
39 U:2037121.1 :2001 JAN12 7201 1443V1 90
39 Ll:2037121 , 1 :200 IJAN 12 7674455H2 51 704
40 LI:356090.1 :2001JAN12 70863709V1 31 416
40 LI:356090.1 :2001JAN12 71227229V1 32 413
40 U:356090.1 :2001JAN12 71228515V1 32 405
40 U:356090.1 :2001 JAN 12 70861981 VI 31 391
40 LI:356090.1 :2001JAN12 70861922V1 31 390
40 Ll:356090.1 :2001 JAN 12 70864304V 1 31 343
40 Ll:356090.1 :2001 JAN 12 70864633V1 31 329
40 Ll:356090.1 :2001 JAN 12 70862585V1 31 294
40 U:356090.1 :2001JAN12 70079244U1 31 231
40 Ll:356090, 1 :2001 JAN 12 70076458U1 1 1 230 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
40 Ll:356090.1 2001 JAN 12 70863660V 1 34 569
40 Ll:356090,1 200 IJAN 12 70079856U1 84 557
40 Ll:356090,l 200 IJAN 12 70862245V1 31 543
40 Ll:356090.1 2001 JAN 12 70079058U1 1 508
40 Ll:356090.1 200 IJAN 12 70074940U1 52 502
40 Ll:356090, l 200 IJAN 12 70862729V 1 31 490
40 Ll:356090.1 200 IJAN 12 71227005V 1 31 489
40 LI.-356090.1 200 IJAN 12 70075515U1 31 473
40 Ll:356090.1 200 IJAN 12 71227343V1 1 437
40 Ll:356090.1 200 IJAN 12 801081 1 H1 580 1 1 16
40 Ll:356090.1 200 IJAN 12 70861 125V 1 162 688
40 Ll:356090.1 200 IJAN 12 70075053U1 339 660
40 Ll:356090,l 200 IJAN 12 70078603U1 1 656
40 Ll:356090.1 200 IJAN 12 70074903U1 160 656
40 Ll:356090.1 2001JAN12 ' 70078544U1 168 655
40 Ll:356090.1 200 IJAN 12 70078319U1 271 656
40 Ll:356090.1 2001 JAN 12 70079370U1 166 654
40 Ll:356090.1 200 IJAN 12 70075069U1 289 655
40 Ll:356090.1 200 IJAN 12 70075221 UI 290 654
40 Ll:356090.1 200 IJAN 12 70075667U1 301 655
40 Ll:356090.1 2001JAN12 70080523U1 233 654
40 LI .-356090.1 200 IJAN 12 70078746U1 481 654
40 Ll:356090.1 200 IJAN 12 70078621 UI 144 654
40 Ll:356090.1 200 IJAN 12 70075142U1 470 654
40 Ll:356090,1 200 IJAN 12 70078628U1 148 654
40 Ll:356090.1 200 IJAN 12 2950926T6 419 650
40 Ll:356090.1 200 IJAN 12 700761 19U1 385 650
40 Ll:356090.1 200 IJAN 12 70078048U1 378 650
40 Ll:356090.1 200 IJAN 12 70076590U1 361 648
40 Ll:356090.1 200 IJAN 12 70861592V1 60 648
40 Ll:356090.1 2001JAN12 70076847U1 237 648
40 Ll:356090.1 2001 JAN 12 70861835V 1 187 648
40 Ll:356090.1 2001 JAN 12 70075666U1 375 648
40 Ll:356090.1 200 IJAN 12 70076635U1 462 647
40 Ll:356090.1 200 IJAN 12 70077993U1 204 642
40 Ll:356090.1 200 IJAN 12 70864362V1 37 634
40 Ll:356090,l 200 IJAN 12 70862035V1 32 577
40 Ll:356090.1 2001JAN12 70862234V1 31 576
40 Ll:356090.1 200 IJAN 12 70863276V1 31 230
40 U:356090.1 200 IJAN 12 70076259U1 31 202
40 Ll:356090.1 200 IJAN 12 70860822 V 1 1 159
40 Ll:356090.1 200 IJAN 12 70861272 V 1 31 87
41 Ll:212142.1 200 IJAN 12 g4897764 1 216
41 Ll:212142.1 200 IJAN 12 g 1812720 15 216
41 Ll:212142.1 200 IJAN 12 3125917H1 61 150
41 Ll:212142.1 200 IJAN 12 6927793R8 63 712
41 U:212142.1 200 IJAN 12 7764545H1 254 692
41 Ll:212142.1 200 IJAN 12 7764545J1 254 692
42 LI: 1096706.1 :2001 JAN 12 4295414T9 1 617
43 U:012622.1 200 IJAN 12 g4312998 1 436 co rn
D f- f_ fc fc fc fc fc fc fc fc fc cf * fc c c f fc fc 4 f5* fc f fccf fc c fc c fc fc fc cf fc fc fc fc fc fc fc fc fc c fc fc f_ f f f fc fc fc fc fc f fc fc fc fc fc fc c c fc fc fc fc F σ-I 45* fc fc fc z
O
— - -- o o o o o o a o o o o o o vl vl vl vl vl sl sl sl sl sl sl sl sj sl sl si sl sl sj sj sl sl si sl sj sl sl sj sj sj sl sl sl sl si N si - . _. _. _, _. — . __ _ _. _. _. _4 _ . — ' — ' — ' ND ND ND NO ND ND ND NJ NJ NJ ND ND ND . o O O O O O O O O O O O O O O O O O O O O o O O o o O O O O O O O O O O O O O O O O rrT O O O O o o O o O O o O O O O O O O o O o O NJ ND NJ ND NJ NJ NJ NJ NJ NO NJ NJ NJ -y
Ol Ol Ol Ol Ol Ol n Oi Ol Ol Ol Ol Oi Oi Ol Ol Ol Ol Ol Ol Ol Ol cπ cn Ol Ol cn O Ol Oi Ol ND ND NJ ND NJ ND ND I\J NJ ND NJ NJ NJ _
NJ NJ NO ND NO ND NJ ND ND NO ND ND ND ND NO ND ND IV) ND ND ND IxD ND ND to NO D KD K ^ -- :-' "— ' ^-' -' — "—■ '—> L_ L_ L-. L-.- _ O O O o O O o O ^ . M > Nb Kb Nb Kb NJ Kb Kό Kb Kb K Kό K Nb Ω
NJ NJ ND NO NO NJ ND ND ND ND NJ ND ND IsD ND ND ND NO NO NO NO ND NJ NJ ND NJ N O O O O O O O O O O O O O ^i- O O O O O O O O O O O O O O O O O O O O O O O o O O O O O O O O O O O O O O O O ® O O O O O O O O O O O O O O O O O O O O o O O O O O — ' — • — ' — ' — ' — ' — » — ' — — ' — — ' — C— c_ c_ c_ c_ c_ c_ c_ c_ i-_ c-. c_. c_ n c_ <_, r r r r c c r c. r , c C- c r . f r r c c, c_. r r r c. c c. c . r , r r c r
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> >>> > > >
Z __ __ Z __ __ Z __ Z __ __ __ Z __ _: __ __ __ __ __ Z Z __ _. __ z __ z __ __ __ __ zzzzzzzzzzzz
NJ ND ND NJ NJ NO NO NJ NJ NJ NJ NJ NJ
Figure imgf000237_0001
NJ NJ NJ NJ NJ NJ NJ — . —. _- —. -_ — — ' —' —' ,<-> no m o rs fc CO GO CO CO h ro NJ _, C/3 ω fc fcco ND ωNJ ofc oO oO oOι NnJ ωOO wsl « ND NJ NJ NJ n 00 CO sl si vj O Ch C -^ Mfc MUl MNJ θ GO ND NJ ND NJ NO Sh S-sJ MNJ O vVji Ol ND —■ —■ —' —' fc fc fc fc fc fc fc —■ — g s-fcl s-fcl Ol Ol Oi O -O ) <n 03 CO cπ Co CO Co Co CJ — ' o vl co o - fc Oi fc fc CjO Os ω C cn fc cπ Oi W Oi Ol Co Oi Ol Co ω Cn Oi Ol Ol fc Ol Co Cn Ol Oi Ol fc Oi Oi ϋi Oi Oi ,.-, ^ rs π r-ι ι\-> r-> ro V| sj _ si Oι CO fc Ol O ND Oi OO OO O O NO Oi CO Ol ND ND O Go ND fc fc O Oi Ol O O Ol vJ O — ' Ol « 3 ^ 01 0l θ S rS Λ -: n _ _ N θ' Ul _ 0- ) 0
^ ω (» <) ^ <) o• -' M co o) (> M -' I- 0 ()) ) ω c oo ^ ^ M ω -. (» - M M « -^ M •o ^ ^ g ^ g ^ g -' M ^ -' 0rJ
co m fc f fc f fc c fc fc fc fc fc fc __
Oi cOi OifcOifcOi Oci OifcOifcOicOfifcOifcOi fcOifcOifcOifcOi ϋfcifcOifcOi fcOi Ofci Ofci fcOifcOi fOc cOi fcOifcOi fcϋ c f if ifOi fcϋifcOi fcOi fcOi Ol Oi Ol Oi Ol Ol Ol fc fc fcfcfcfc fcfc fcfc fcfcfcfc fcfc σ o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o O o
ND N) ND N) N) ND N) ND N) ND ND ND N) ND N) N) N) NJ N) N) N) N) NJ N) N) ND N) ND N) N) N) N) N) N) N) N) N) N) N) N) N) N) Vl vl -sj sj si sl sl sl
CO CO ro CO CO CO CO CO O O ro CO 00 O CO CO co CO O CO CO CO CO CO CO CO ro (ύ O CO CO CO Cύ Cύ ro C
CO 00 CO 00 oo CO 00 00 oo 00 00 00 00 03 00 00 00 CO O 00 00 03 00 00 00 00 00 O 00 co 00 oo 00 00 00 00 00 00 00 00 00 00 o O O O O o o o O O O
CO CO co co CO co 00 CO co Co Co O Go CO Go GO o o o C --D,
GO 00 C CO GO GO 00 CO co 00 CO Go Co CO G 03 CO CO GO GO 00 00 Go 00 CO CO G Oi cn 3 O°i Ol Ol cn cn cn 3
ND NO ND NJ NJ Ko KD NJ O
ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND NO ND ) ND NJ NJ NJ ND ND NJ NJ NO NO ND ND ND NJ NJ ND NJ ND NJ NJ NJ NJ NJ ND O O O O o o <? Q n ro C D ro ro C ) r> C ) CO CO O C ) C C ) C ) C ) C ) C ) C ) C ) C ) J CD C ) C ) C ) C ) CD C ) C ) C J C ) C ) C J C ) C J C ) ( ) C J C ) D ND NO NJ NJ NJ Kb Kb o o O O o o O o o O O o U O C_) o O o O CJ CJ O O O o o o o o C_) CJ C_) O O O O O O O o o C_ Φ O O O O O c_ r r r c r c_ c . c_ c c . c_ c . c c_ c , c_ c... c c_ c, c c c . c c r . r r c_ .. r c „ c_ o o
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > z z Z Z Z __ Z Z Z Z Z. -_ Z Z __ __ __ __ __ _: __ _: __ __ __ __ Z __ __ __ __ __ __ _i > > >
NJ ND ND NO ND ND NJ NO ND ND W ND NJ NJ NJ NJ NJ NJ ND NJ NJ NJ ND NJ ND NJ NJ NJ NJ NJ ND ND NO ND NO NJ NJ ND NJ ND NO z > z z > z > z >
ND ND ND ND ND NO NJ NJ
Figure imgf000238_0001
__ — . __ __ __ _, __ _, __ |s_ |s NJ ND NJ NJ NJ — ' — ' ND ND — ' — D ND ND ND , ,
— ■ — ' Vl OO sl sl sl sl si O — ' ND ND GO ND NJ GO CO ND ND — ' GO 00 N fc C 1 00 00
ND ND ND vj sl « C ^ _ _ ^ ∞ ω ω co fc N O — ' O — ' NO ND — ' — ' — ' — ' O ND O fc fc ND ND Cπ Ol ND O cn o O sl VI Ol Ol sl o __ 2] co ro o OOD NjJ NsDj fcND Q-2 O — 1 NI Go Oi — ' O O Oi OO OO OO Co O O fc Oo fc O Oi Ol — ' sl O — ' O o O 00 O O Ol oo cn o I
00 NJ O ND 00 00 si Ol O __.
— ' — . —. NJ NJ — — ' NJ ND ND ND NO NO ND ND — — < ND ND — ' — ' — , ND ND ND ND _, , ND __ —. —. —. — . |\j | > __ — . CO CO O ϋl GO O O O CO fc fc fc fc Ol fc fc O O fc fc fc O O — . o fc fc fc fc o fc fc fc O Oi fc Ol Oi NJ Oi fc Ol fc Ol Ol Oi Ol fc ^+ Ol O O fc O OO O O OO ND sl sJ O O O O O O O O ND O O -<) VJ sh C>- <>- > co cπ Sl Ol O O O O — ' l O fc NJ Ol ND O O fc — ' O O sl fc Co sl O Oi OO fc CO Co ND Oi O — ' OO O O vj sO — ' ND O Ol o CO CO 00 oo 00 fc — * Sl — ' — ' NJ O O fc O NJ — ' ND O NJ OO Ol fc fc CO -—
CO m fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc o =
Cn Ol Oi Oi Oi Cn Oi Oi Oi Oi Oi ϋi Oi Oi Ol Oi Oi Ol ϋi Oi Ol Oi Oi Oi Oi Oi Ol Oi Oi Oi Oi Cn Cn Ol Oi Oi Oi ϋi Oi Oi Oi Oi Oi Oi Oi Ol Oi Oi CT
Z
O
o N3ooN3 WooMoMoM MoNo3 MoMoNo3o3 MoMooM MoMoMooM WoMooM WooMoM MooM MoWoMooM MoMoMooNJ WoooMoWoWoWoKoDoMoWooMoo
03 CO GO GO CO C0 03 03 C0 03 03 CO GO GO G CO CO CO CO C_ GO G CO CO CO C_ G0 03 0 GO CO GO GO CO C0 03 W
CD3 00 O3 CD3 03 O3 C» C» CD3 C» C_ C» C» CX3 C» C» CJ3 CJD C0 CO C» CO CO
Co C CO CO CO G0 G0 CO C_ O3 O3 W O3 G0 00 C G0 G0 3 O3 O3 G0 CO C G0 Co C~ G0 C O3 C O3 Co W L_ __ L_ L- L_ L_ L_ L_ L_ L_ L_ L_ ^ L_ __ _- L- L_ ^ L_ L_ L_ L_ L_ L_ L_ L_ L_ L_ L_ __ L_ L_ L_, L_ L_ L^
Kob Kob Kob KoboKb Kob Kob KoboKb KoboKb Kob Kob Kob KoboKboKb Kob Kob Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo→C-
NJ i C_ C_ _. _. C_ C_ C_ _. C_ C_ _. C_ C_ (_, _. _. C_ C_ _. (_. _. _. _. _. _. C_ _. _. _. _. _. C_ C_ C_ _. C_ _. _. C_ C_ C_ _. C_ _. C^
CO ' co >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> M NJ M M W W W W W M M M M M M M M M M M M M M M M M W M M ND W W M M M M
3
g g Ol Ol fc fc fc fc σ o -. o θ3
Figure imgf000239_0001
VI CO — O fc — ' Co fc
ND NO -fc N_
VI O
— . |s_
Figure imgf000239_0002
TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
45 U.023813.1:2001 JAM 2 71591588V1 1237 1953
45 LI:023813.1:2001JAN12 71051086V1 1229 1806
45 LI:023813.1:2001JAN12 3938096H1 1235 1354
45 U:023813.1:2001JAN12 70136311V1 1050 1457
45 LI:023813,1:2001JAN12 71596758V1 1049 1740
45 LI:023813.1:2001JAN12 71593352V1 17832547
45 LI:023813.1:2001JAN12 2579582H1 391 574
45 LI:023813.1:2001JAN12 gό50612 2193 2470
45 Ll:023813.1:200 IJAN 12 2330443H1 22002467
45 LI:023813,1:2001JAN12 70139578V1 21702458
45 LI ;023813.1:200 IJAN 12 gόδOόl 1 21742447
45 LI:023813.1:2001JAN12 764838H1 20132311
45 LI:023813.1:2001JAN12 g2987604 2011 2301
45 LI:023813.1:2001JAN12 1242649H1 1604 1829
45 LI:023813.1:2001JAN12 71596934V 1 1617 2367
45 LI:023813.1;2001JAN12 7760577H1 16362298
45 L1:023813.1:2001JAN12 70140562V1 16322127
45 LI:023813.1:2001JAN12 6456152H1 16932345
45 U:023813.1:2001JAN12 71592946V1 17662547
45 LI:023813.1:2001JAN12 7703837J1 15832231
45 U:023813.1:2001JAN12 71053823V1 15862241
45 LI:023813.1:2001JAN12 3422029H1 1227 1502
45 LI:023813,1:2001JAN12 6123937H1 1227 1745
45 LI:023813.1:2001JAN12 5603955H1 1224 1448
45 U:023813.1:2001JAN12 70138418V1 1036 1482
45 U:023813.1:2001 JAN 12 2917935H1 21642444
45 LI:023813.1:2001JAN12 2753762H1 21662441
45 LI :023813.1:200 IJAN 12 g4735992 20502468
45 LI:023813.1:2001JAN12 6575246H1 1211 1668
45 LI :023813.1:200 IJAN 12 2579582F6 391 574
45 LI:023813.1:2001JAN12 71593267V1 15672296
45 U:023813.1:2001JAN12 71054430V1 391 943
45 LI:023813.1:2001JAN12 4184801 HI 20482346
45 LI:023813.1:2001JAN12 5602836H1 20102334
45 LI:023813.1:2001JAN12 71052582V 1 1034 1433
45 L1:023813.1:2001JAN12 71051387V1 391 989
45 LI:023813.1:2001JAN12 5605801 HI 20102323
45 LI:023813.1:2001JAN12 71054293V1 1027 1587
45 LI:023813.1:2001JAN12 70130903V1 1028 1440
45 LI:023813.1:2001JAN12 71051860V1 1021 1628
45 LI.O23813.1:2001 JAN 12 4911790H1 1021 1317
45 LI:023813.1:2001JAN12 162415H1 332 572
45 LI:023813.1:2001JAN12 71050948V1 391 1002
45 LI:023813.1:2001JAN12 4141003H1 2007 2326
45 LI :023813.1:200 IJAN 12 71596725V1 1016 1745
45 L1:023813.1:2001JAN12 71593444V 1 974 1690
45 LI:023813.1:2001JAN12 71594183V1 991 1696
45 LI:023813.1:2001JAN12 2790890H1 1005 1305
45 LI:023813.1:2001JAN12 71052622V1 1009 1603
45 LI:023813.1:2001JAN12 5619230H1 319 632 TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
45 U:023813, l 200 IJAN 12 2918553H1 2164 2422
45 Ll:023813.1 200 IJAN 12 g4312570 1992 2461
45 Ll:023813,l 200 IJAN 12 1217255T1 1992 2432
45 Ll:023813.1 200 IJAN 12 g3539274 2005 2477
45 Ll:023813, l 200 IJAN 12 g!315099 2014 2487
45 Ll:023813.1 200 IJAN 12 617758H1 1562 1847
45 U-.023813.1 200 IJAN 12 5899357H1 1565 1846
45 Ll:023813.1 200 IJAN 12 70897708V1 1213 1834
45 Ll:023813.1 200 IJAN 12 2448346H1 928 1 185
45 Ll:023813.1 200 IJAN 12 70142332V1 937 1471
45 Ll:023813,l 2001 JAN 12 31 1441 1 H1 912 1 199
45 Ll:023813.1 200 IJAN 12 71594328V1 907 1700
45 LI.O23813.1 200 IJAN 12 g3308412 2161 2468
45 Ll:023813,l 200 IJAN 12 g897555 2152 2458
45 Ll:023813.1 200 IJAN 12 5427387H1 2163 2380
45 Ll:023813.1 200 IJAN 12 g2322281 2165 2469
45 LI.O23813.1 200 IJAN 12 1721227T6 2124 2431
45 Ll:023813.1 200 IJAN 12 1339952H1 2136 2403
45 Ll:023813.1 200 IJAN 12 5084432H1 1975 2234
45 LI-.023813.1 200 IJAN 12 1217255H1 1992 2267
45 Ll:023813.1 200 IJAN 12 1632463H1 1556 1747
45 Ll:023813.1 200 IJAN 12 70139536V1 1203 1688
45 U-.023813.1 200 IJAN 12 3186948H1 1 198 1517
45 LI-,023813.1 200 IJAN 12 gό402027 1962 2469
45 Ll:023813.1 200 IJAN 12 g4452564 1965 2470
45 Ll:023813,l 200 IJAN 12 g3920225 1970 2467
45 Ll:023813.1 200 IJAN 12 71039953V1 1952 2097
45 Ll:023813.1 200 IJAN 12 71051095V1 1538 2325
45 Ll:023813.1 200 IJAN 12 7620085J1 1532 2132
45 Ll:023813.1 200 IJAN 12 70143382V1 1540 2136
45 Ll:023813,l 200 IJAN 12 71052715V1 1550 2241
45 Ll:023813.1 200 IJAN 12 5871821 H1 1562 1841
45 Ll:023813, l 200 IJAN 12 71596427V1 1520 2295
45 Ll:023813,l 200 IJAN 12 2754217H1 1521 1816
45 Ll:023813.1 200 IJAN 12 4516659H1 1510 1786
45 Ll:023813.1 200 IJAN 12 3903106H1 900 1 197
45 LI.O23813.1 200 IJAN 12 3581039H1 293 588
45 Ll:023813.1 2001 JAN 12 71596922V1 288 564
45 Ll:023813.1 200 IJAN 12 3581039F6 293 574
45 Ll:023813.1 2001 JAN 12 3278325H1 1947 2256
45 Ll:023813.1 200 IJAN 12 7741963J1 1 190 1820
45 Ll:023813,1 200 IJAN 12 70140912V1 1 191 1567
45 Ll:023813.1 200 IJAN 12 4376245H1 1 189 1471
45 Ll:023813.1 200 IJAN 12 4374209H1 1 189 1484
45 Ll:023813.1 2001 JAN 12 4196441 HI 1 189 I486
45 Ll:023813.1 2001 JAN 12 1389880T6 230 571
45 Ll:023813.1 2001 JAN 12 7196709H1 201 574
45 Ll:023813.1 200 IJAN 12 g791516 2129 2474
45 Ll:023813.1 200 IJAN 12 71625082V1 1939 2460
45 Ll:023813.1 200 IJAN 12 7703837H1 1505 1995 CΛ m fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Cn Oi Oi Oi Oi Oi ϋi Oi Oi Oi Oi Oi Oi Oi Oi Oi Ol Ol Cn Cn Oi Oi Oi Oi Oi Oi Oi Cn Oi Oi Oi ϋi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi
oNJooNJ NoJoM NoJ NoJoM NoO NoDoNDoNJ NoJoNJ NoJoNJ Mo NoJ
oooo
Figure imgf000242_0001
NO M
Figure imgf000242_0002
0 Hi0r CO Co co CO CO GO Co Co co fc fc fc o oo 00 03 00 00 00 03 00 00 oo ~ ~, —• ND ND NO n fc - fc 00 o Ol cf Ol CO CB 5 rS GO O sl O O CO CO GO GO ϋl CO o O 00 CO NJ O 00 O sl _ vi o Ol fc CO o O CO Co O O NJ ND ND 00 O O Q C° — • ND Ol CO Sl fc — . — ' 00 c fc Co 00 — ' NJ O o n S o ω ∞ ** fc NJ Ol O O — ' cn o NJ O — O CO fc - fc q.
— ' — ' ND NJ — ' ND — . —. —. Ol O — ' O O O sj sj ζn fc Oi Go co Oi Oi Oi Oo fc — ' O fc O NJ CO O sJ Cn
Figure imgf000242_0003
TABLE 5
SEQ ID NO; Template ID Component IC Start Stop
45 Ll:023813.1. 200 IJAN 12 71592264V1 858 1466
45 U:023813.1 - 200 IJAN 12 71595696V1 858 1466
45 U:023813.1 200 IJAN 12 71591762V1 862 1380
45 Ll:023813.1 2001 JAN 12 7724296J1 1 226
45 Ll:023813.1 - 200 IJAN 12 1609990H1 2120 2373
45 Ll:023813.1 - 200 IJAN 12 g852616 21 14 2475
45 Ll:023813.1 200 IJAN 12 71592734V 1 2155 2546
45 Ll:023813.1 200 IJAN 12 g2930805 2107 2470
45 Ll:023813.1 200 IJAN 12 71594851 VI 1823 2000
45 Ll:023813.1 200 IJAN 12 71051872V! 1829 2220
45 U:023813.1 2001 JAN 12 343785H1 1801 1939
45 Ll:023813,l 200 IJAN 12 2579582T6 1809 2425
45 Ll:023813.1 200 IJAN 12 1702260T6 1810 2430
45 LI-.023813.1 200 IJAN 12 g725154 1815 2016
45 Ll:023813.1 200 IJAN 12 809399R1 1796 2413
45 Ll:023813.1 200 IJAN 12 809399H1 1796 1900
45 Ll:023813.1 200 IJAN 12 71596228V 1 1831 2484
45 LI.O23813.1 200 IJAN 12 gl315027 1797 2315
45 Ll:023813.1 200 IJAN 12 5612714H1 1778 2022
45 Ll:023813.1 200 IJAN 12 6316652H1 1780 2099
45 LI.O23813.1 200 IJAN 12 4284237H1 1794 21 16
45 Ll:023813.1 200 IJAN 12 009619H1 1772 2068
45 Ll:023813.1 200 IJAN 12 71053089V 1 1773 2221
45 U:023813, 1 200 IJAN 12 4949436H1 1453 1762
45 Ll:023813.1 200 IJAN 12 1881176H1 1432 1729
45 Ll:023813.1 200 IJAN 12 4239223H1 1439 1753
45 Ll:023813.1 200 IJAN 12 70138975V1 1445 1942
45 LI-,023813.1 200 IJAN 12 71593994V1 1454 2042
45 Ll:023813.1 200 IJAN 12 3871668H1 1409 1615
45 Ll:023813.1 200 IJAN 12 71594516V1 1422 2132
45 Ll:023813.1 200 IJAN 12 71594169V1 1425 2038
45 Ll:023813.1 200 IJAN 12 71593408V1 1433 2206
45 Ll:023813.1 200 IJAN 12 71597347V1 1430 2185
45 Ll:023813.1 200 IJAN 12 g791515 1327 1564
45 Ll:023813.1 200 IJAN 12 153381 OH 1 1331 1556
45 Ll:023813.1 200 IJAN 12 1721227H1 1325 1527
45 Ll:023813.1 200 IJAN 12 71594342V1 1327 1835
45 Ll:023813.1 200 IJAN 12 1720660H1 1325 1551
46 Ll:229030.1 2001 JAN 12 4781 167T9 1 185
46 Ll:229030.1 200 IJAN 12 4781 167H1 1 172
46 Ll:229030.1 200 IJAN 12 4781 167F8 1 534
46 Ll:229030.1 2001 JAN 12 1432196H1 54 179
46 Ll:229030.1 200 IJAN 12 1432196R7 54 404
46 Ll:229030.1 200 IJAN 12 1437682H1 432 534
47 U:1072894.ς >:200UAN12 4201204H1 1 166
48 Ll:2031263.1 :2001JAN12 6765672J1 470 957
48 Ll:2031263.1 :200 IJAN 12 1533163H1 509 731
48 LI :2031263.1 :2001JAN12 1534174H1 516 731
48 Ll:2031263.1 :2001JAN12 1660628F6 373 711
48 Ll:2031263.1 :2001JAN12 1660628H1 496 711 TABLE 5
SEQ ID NO; Template ID Component IC Start Stop
48 Ll:2031263.1 2001 JAN 12 6825860J1 124 709
48 Ll:2031263.1 2001 JAN 12 6825860H1 124 653
48 Ll:2031263.1 200 IJAN 12 g4524399 543 966
48 Ll:2031263.1 200 IJAN 12 g5741363 504 966
48 Ll:2031263.1 2001JAN12 g6699720 565 965
48 Ll:2031263,1 2001 JAN 12 g3076052 1 236
49 LI:432285.3:2001JAN12 71345241 VI 1803 2437
49 LI:432285.3:2001JAN12 7628344H1 1860 2435
49 LI:432285.3:2001JAN12 7752313H1 1925 2593
49 LI:432285.3:2001JAN12 71343235V1 2056 2627
49 U:432285.3:2001JAN12 7636991 Jl 2072 2591
49 U:432285.3:2001JAN12 g6704845 2098 2604
49 LI:432285.3:2001JAN12 g6704850 2098 2604
49 U:432285.3:2001JAN12 7672983H1 2127 2578
49 LI:432285.3:2001JAN12 71347709V1 2149 2628
49 LI:432285,3:2001JAN12 71343267V1 11 12 1592
49 LI:432285.3:2001JAN12 71345201 VI 1 1 17 1693
49 LI:432285.3:2001JAN12 71348163V1 1 1 18 1536
49 LI:432285.3:2001JAN12 71340293 V 1 1203 1633
49 LI:432285.3:2001JA 12 71343535V1 1204 1910
49 LI;432285.3:2001JAN12 71340167V1 1203 1635
49 U:432285.3;2001JAN12 7937207H1 1222 1861
49 LI;432285.3:2001JAN12 71348037V1 1240 1772
49 U:432285.3:2001JAN12 55013979J1 1248 1549
49 U:432285.3:2001JAN12 55013979H1 1253 1552
49 LI;432285.3:2001JAN12 71340074V1 1284 2060
49 LI:432285.3:2001JAN12 71346959V1 1304 1902
49 LI:432285.3:2001JAN12 7636991 HI 1407 1897
49 LI:432285.3:2001JAN12 71347053V1 1410 1875
49 LI:432285.3:2001JAN12 7752313J1 1419 2050
49 LI;432285.3:2001JAN12 71348412V1 1427 2053
49 LI:432285.3:2001JAN12 71342850V1 1429 2154
49 L1:432285.3:2001 AN12 71345305V1 1440 2043
49 L1:432285.3:2001JAN12 71347659V1 1461 2074
49 LI:432285.3:2001JAN12 71343325V1 1490 2163
49 LI:432285.3:2001JAN12 81 12161 H1 1527 2154
49 LI:432285.3:2001JAN12 7938749H1 1527 21 14
49 LI:432285.3:2001JAN12 7403755H1 1556 2033
49 LI:432285.3:2001JAN12 71345015V1 1564 1987
49 LI:432285.3:2001JAN12 71345374V1 1564 1987
49 U:432285.3:2001JAN12 7584167H1 1621 2200
49 LI;432285.3:2001JAN12 71345490V1 1635 2292
49 LI:432285.3:2001JAN12 7946452H1 1 423
49 LI:432285.3:2001JAN12 7946452J1 308 903
49 LI:432285.3:2001JAN12 7979048H1 473 1 108
49 LI:432285.3:2001JAN12 8122922H1 472 982
49 LI.-432285.3:200 IJAN 12 7978405H1 474 1071
49 LI:432285.3:2001JAN12 8114782H1 487 1 160
49 LI:432285.3:2001JAN12 7461856H1 492 1 108
49 LI:432285.3:2001JAN12 81 14589H1 492 1096 co m
D
Con Ooi Ooi Ooi Ooi Ooi Ooi Ooi Ooi Con Ooi Ooi Ooi Ooi Ooi Ooi fc •P- fc fc fc fc fc fc fc fc fc cf fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc O O O O O O O O O O O O O O O O O O O O O o o O O O O O O O O O O o- O
0
Figure imgf000245_0001
fc fc s| O Co oo ∞ ∞ rø ^ vJ ^ ^ j^ -^ o O O O Oi Oi Oi Oi Oi Oi fc fc fc fc fc CΛ si _ - _ j_ g ~ oo r O O O O O <> > S ≤ g g sI O O O O O O O O — ' O O CO Go NJ O O O _ O _ O _ - - Q c Oτi> —- —- to to N¥Oi yCoi cCnn cOoO OO OO OO sJ sJ O fc Ol fc O 00 Ol O fc O vi v COj oCO oOi COn rv| gV> P0s1 ^00 0^ S0 f —ϊ O OO CO OO Oo O Oo OO Oi O Co OO fc CO Co Co Go NJ
NO ND NO NO — ' CD jS ≤ O Oi Go fc O O Co — ' GO O O O O O O OO sl sl O O GO O O NJ CO CO fc O O NJ o _, 0 0 ^ - -' fS θ O O O rS ul j OO Oi sj co o fc Cn NJ -si co cn r- O O — NJ fc Ol O NJ O Cn Ol CO O — ' NJ O fc O O 0 vl Oi O S Oi O fc S fc O Oi Oi O
Go — ^ O G J o- — ' Cπ O vl sJ θ 1-' sl fc 00 — ■ O ND O fc ND ND GO ND — . O OO fc Oi Oi P fc - fc^ o
TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
50 LI: 1 177772.30:2001 JAN 12 g490103ό 545 1003
50 LI: 1 177772.30:200 IJAN 12 g5367522 545 1005
50 LI:1 177772.30:2001JAN12 2657123H1 597 838
50 LI: 1 177772.30:2001 JAN 12 gl550158 607 777
50 LI: 1 177772.30:2001 JAN 12 1259464F6 427 895
50 LI:n 77772.30:2001JAN12 4993792H1 80 355
50 LI:1 177772.30:2001JAN12 2770226H1 98 344
50 U:1 177772.30:2001JAN12 5918564H1 125 390
50 LI: 1 177772.30:2001 JAN 12 2264783H1 691 921
50 LI: 1 177772.30:2001 JAN 12 831367H1 696 965
50 LI: 1 177772.30:2001 JAN 12 4750163H1 624 782
50 LI: 1 177772.30:2001 JAN 12 1614463F6 39 505
50 LI: 1 177772.30:2001 JAN 12 1614459H1 39 234
50 LI: 1 177772.30:2001 JAN12 8054823J1 68 607
50 LI:1 177772.30:2001JAN12 5206754H1 1 236
50 LI: 1 177772.30:2001 JAN 12 5206754F6 1 598
50 LI: 1 177772.30:2001 JAN 12 1549994H1 33 224
51 U:475420.2:2001JAN12 6330686F6 555 1 150
51 LI:475420.2:2001JAN12 8185916H1 586 1 194
51 L1:475420.2:2001JAN12 6330686H1 762 1 150
51 L1:475420.2:2001JAN12 1483315H1 835 1 120
51 U:475420.2:2001JAN12 6145789H1 2015 2462
51 LI:475420.2:2001JAN12 6149407H1 2030 2592
51 LI:475420.2:2001JAN12 8188384H1 21 1 1 2726
51 LI:475420.2:2001JAN12 g!472143 2086 2376
51 LI:475420.2:2001JAN12 8181 166H1 2955 3574
51 LI:475420.2:2001JAN12 6146433H1 2956 3163
51 LI:475420.2:2001JAN12 8186187H1 3109 3666
51 LI:475420.2:2001JAN12 6150104H1 3201 3641
51 LI:475420.2:2001JAN12 8100783H1 3279 3933
51 LI:475420.2:2001JAN12 g2016435 3531 3837
51 LI:475420.2:2001JAN12 8097088H1 3549 3991
51 U:475420.2:2001JAN12 8097549H1 3735 4022
51 U:475420.2:2001JAN12 6340939H1 3803 4391
51 LI:475420.2:2001JAN12 g2017181 3842 4030
51 LI:475420.2:2001JAN12 8183327H1 4017 4567
51 LI:475420.2:2001JAN12 6867808H1 4147 4675
51 LI:475420.2:2001JAN12 g2016647 4168 4297
51 LI:47542O.2:2001JAN12 g6038570 4298 4714
51 LI:475420.2:2001JAN12 g 1472041 4346 4716
51 LI:475420.2:2001JAN12 g 1068067 4520 4680
51 LI:475420.2:2001JAN12 g2806108 4524 4706
51 LI:475420.2:2001JAN12 g 1099271 4524 4661
51 LI:475420.2:2001JAN12 6330686T6 1 488
51 LI:475420.2:2001JAN12 g 186532 334 4722
51 LI:475420.2:2001JAN12 g 1068090 344 609
51 U:475420.2:2001JAN12 g 186542 453 4689
52 LI :017599.3:200 IJAN 12 2885076H1 12 63
52 Ll:017599.3:2001 JAN12 2885076F6 1 178
52 U:017599.3:2001JAN12 6598423T8 1 219 CO m
Oi Oi Oi Oi Oi Cπ Oi Oi Oi Oi Oi Oi Oi ϋi Oi Oi Oi Oi Oi Oi Oi Oi Oi Cn Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi ϋi Oi Oi ϋi Oi Oi ϋi Oi Oi Oi Oi Oi Oi Oi Oi __ ω cύ w w « w c ω ω u ω ω ω ω ω ω ω w u w ) C (o ) ω ιo M M W M M W W M M W M M N3 M ND NJ NJ NJ NJ NJ NJ σ
Z O
o o o o o o o o ΓD o o o o o o o o o o o o o ΓD o o o o o o o o o ro o o o o o o ΓD ΓD o
Co Co ύ
CJ o o o o o c J sl sl l si Sl si si Sl si sl Sl VJ sl cn n cn cn cn cn cπ cn cn cn cπ cn cn cπ cn cn cn cn cn cn cn <n <π cn cn <n cπ cn cπ cn cn cn cn cπ cn cn <n cn cn cn rn cn cn cn <n cn cn cn rn rn c ) c ΓD ro ( ) c ) r > c ) - o o • o -o -<) o - o o ) -o -o o o -o o -o -o
NJ IVD t o o o o o o o O o o o o 3
NJ N to Co co Go co co co co Co Cύ Co Go Co co Co O
ND ND ro to ND ND ND ND O
C D c ) CD O C ) CD -+ o o o o o o CJ o o CJ O J
NO c_ c. c._ c_ c_ c_ c , c_ c_ c. c_ c._ c_ r r c_ r , r . c c_ c_ r , c_. c_ c . c_ c. <_ c_ c_ c_ c c_ c_ c c_ r c . c . c _ c c._ c. π -fc > >> >>>> >> >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (
Z Z Z Z Z Z Z Z Z 2 __ z Z z z _- Z _- Z z z Z Z __ z 2 z z 2 Z 2 2 2 Z Z 2 2 2 Z 2 Z Z 2 Z 2 Z 2 2
M NJ NJ NJ ND O O ND NJ ND NJ NO NO M NJ NJ NJ NJ NO NJ NJ M NJ ND NJ NJ NO NJ NJ NJ NJ NJ M ND ND ND W NJ NJ NJ ND ND M NJ NJ NJ NJ
o O oO -NJ oO
O NJ —CO ' vCOI C-O'
CX3 CX3 fc -'
O ND O NO sJ O sl ND
__ __ < <
Figure imgf000247_0001
vl vl O Ol fc fc fc ^ ^ J^ ^ i Pi Pi i ^ ^ _; _^ I_ _^ __ I_ _i -_ _ sg sj o- O O Oi Oi Oi Oi Oi Ol fc lO — • — ' j ri →
^ δUO ^l ∞∞ vj gI ^ ^^i feQj ωQ ^Q. ∞^ ^^ cyj o^ oQ, ^^ ^ S^ ^^ ^^ ^^ ^ ^ ^ ^ Ol -' Ul Us u- Ul Ul W W W l UI J- O- U- ;-_
Figure imgf000247_0002
TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
53 LI:030502.2:2001JAN12 70931538V 1 1 152 1537
53 LI:030502.2:2001JAN12 71278648V 1 1154 1506
53 LI:030502.2:2001JAN12 70924640V 1 1 154 1258
53 U:030502.2:2001JAN12 70924914V1 1 154 1209
53 LI:030502.2:2001JAN12 71278312V! 1 154 1609
53 LI:030502.2:2001JAN12 71277968V1 1 154 1344
53 LI:030502.2:2001JAN12 70932007 V 1 1 154 1603
53 LI:030502.2:2001JAN12 71277644V1 1 154 1601
53 LI:030502.2:2001JAN12 70929617V1 1 154 1589
53 LI:030502.2:2001JAN12 70931 179V 1 1 154 1595
53 LI;030502.2:2001JAN12 3859766H1 546 838
53 LI:030502.2:2001JAN12 70929235V 1 407 871
53 LI:030502.2:2001JAN12 70930027 V 1 407 871
53 LI:030502.2:2001JAN12 71277273V 1 407 871
53 LI;030502.2;2001JAN12 g4124898 493 704
53 LI:030502.2:2001JAN12 70931645V1 479 871
53 LI:030502.2:2001JAN12 71278044V1 467 871
53 L1:030502.2:2001JAN12 g820797 1707 1980
53 LI:030502.2:2001JAN12 71277723V1 1773 2013
53 LI:030502.2:2001 AN12 201 1581 H1 334 449
53 LI:030502.2:2001JAN12 g874529 403 764
53 LI:030502.2:2001JAN12 g766269 403 673
53 LI:030502.2:2001 AN12 g8099152 405 1666
53 LI:030502.2:2001JAN12 70929351 VI 407 871
53 U:030502.2:2001JAN12 71277386V1 407 871
53 LI:030502.2:2001JAN12 70932434V1 407 871
53 U:030502.2:2001JAN12 70931404V1 407 871
53 LI:030502.2:2001JAN12 70929721 VI 407 871
53 LI:030502.2:2001JAN12 70929619V1 407 871
53 LI:030502.2:2001JAN12 70977552V 1 1608 1967
53 LI:030502.2:2001JAN12 70932714V1 439 871
53 LI:030502.2:2001JAN12 4419243T6 1578 1922
53 L1:030502.2:2001JAN12 g767063 433 657
53 LI:030502.2:2001JAN12 70931214V1 1566 1978
53 LI:030502.2:2001JAN12 71278359V1 408 903
53 LI:030502.2:2001JAN12 71276789V1 1543 1967
53 LI:030502.2:2001JAN12 g5836378 1510 1972
53 LI:030502.2:2001JAN12 70929884V1 1530 1963
54 LI: 1181337.3:2001 JAN 12 6934884H1 1 588
54 LI:1 181337.3:2001JAN12 815760T6 459 1013
54 LI:1 181337.3:2001JAN12 2184261 HI 463 755
54 LI: 1 181337.3:2001 JAN12 2184260F6 463 898
54 LI:1 181337,3:2001JAN12 2255550H1 501 753
54 LI:1 181337.3:2001JAN12 70451917V1 543 1024
54 LI:1 181337.3:2001JAN12 70449692V1 585 903
54 LI:1 181337.3:2001JAN12 70451723V1 585 900
54 LI: 1 181337.3:2001JAN12 2043836H1 603 733
54 LI:1 181337.3:2001JAN12 g343041 1 615 854
54 LI: 1 181337.3:2001 JAN 12 2184260T6 629 997
54 LI: 1 181337.3:2001 JAN 12 1493761 HI 667 889 CO m
© ϋi Oi cjD Oi Oi O Oi Oi Oi Oi Oi Oi Oi oi Oi Oi ϋi Oi Oi Oi Oi Oi cjD cn cn cn oi cn cn cn cn cn oi cn cn cn cji oi cn cn cn oi o^
Ol Oi Oi Ol Oi Ol Ol Oi Ol Oi Ol Ol Ol Oi Ol Oi Ol Ol Ol ϋl Oi Oi Oi Ol Ol Ol Oi Oi Oi Oi ϋi Ol Oi Oi Ol Oi Oi Ol Oi Oi Ol Oi Ol Oi Oi Oi Cn cn Oi fc O o
O O O O O O O o Ch O Ch o Ch o c fc fc o o O O O o o o o O O o Ch o o o o O O O O O O O O O O O 00 fc f fc fc fc fc fc fc fc fc fc fc fc c fc fc fc fc O fc fc fc fc fc fc fc fc fcfc fcfc fc fc fc fc fc fc fc fc fc fc fc fc fc O O O o fc f o O O o C o (h O O O o o o o o o o o o o- O O si sj sl vl vl vi vl VI VI VI o sj o vj vl sl sj sj j sl sj sj sj v o -l o si s ol o o o o o
-sl s - osj -sj o o O O O O O CO φ sl ssll si vj sl sj sj vl si sl v| sj s sj C
NJ NO NO NO NJ NJ NJ ND ND NJ NJ ND NJ I NJ NJ ND ND NO ND NO N NJ NO NO NO NO ND NJ NJ ND ND N NOJ r NoO ND NO ND ND ND ND ND NJ NO NO NO ND ND NJ NJ vl GO CO CO CO O C CO CO CO O CO Go co co CO co CO C GO O *CO CO CO CO CO CO CO CO CO CO co CO C COO cCoO GO CO CO CO GO CO GO CO GO GO CO CO CO CO CO CO "O
ND NJ ND ND ND ND ND ND ND ND ND ND ND ND ND NO ND ND fό Kb Kb ND NO NO NJ ro ND Kb NJ ND ND ND ND NJ NJ ND ND NO NO NJ ND NJ NJ NJ NJ NJ fs Q O O O O O O CJ O C ) C ) C _ ) O O O O O O O O O o o o C > ( ) C ) C ) O O O O O O O O O O O O O O O O O O O o o o o o o o o OO O O O O O O O O O O O o o o o o o o O O o o O O O O O O O O O O φ c_ C_ C_ c_ c_ c. c_ c_ c_ _. <_. _. c_ c_ .. c . c.. c c_ c_ c_ c_ c_ c_ c_ c_ c_ c._ c. _ c_ c_ c_ c_ c_ c . c_ c_ r c c . c .. c r r c c . c_ c_ c
> >>> >> > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Z Z Z Z Z 2 2 2 2 z z z 2 Z Z Z 2 2 2 2 2 2 Z Z 2 2 2 2 2 2 Z 2 Z Z 2 2 Z Z Z 2 2 2 2 Z Z 2 2 Z Z 2
N ^J ^J ^o ^ ^J ^ ^J NJ ^J ^J ND M ^J ^J M ^ ^J ^J ^o ^o NJ J ^J ^J No ^o D ^ r M ND ^o J ^ NJ N NJ ^J NJ ^D NJ ^J ^ NO NJ M
Figure imgf000249_0001
> __ —. __ — • __ _. -- — , O O O O O O 00 00 00 Ol O O OO — ' O O O O o o O Ol Co CO CO CO CO NO ND — O O ND CO O OO Oi fc o oo co si si cπ s o fc —' ND vj s O O O NO 00 00 O CO GO CO co co O CO Ol IO — ' Ol — ' VJ CO si p o O fc —' CO O — O —' CO 00 O vl 00 sj O co co co CO NO
Figure imgf000249_0002
CO —■ ND ND —■ —■ —'
O O O Ol fc fc ιs (vi __ tvτ ro — . K-1 ro 0 r.i ts.-. 00 sl si 00 00 fc fc fc fc Co fc fc fc fc NJ 03 — ' NO fc GO fc GO CΛ
_- jvθ W ∞ ND vI Oθ O°
—■ O CO —■ CO O Ol fe oo ω NJ ∞ c» ^ o '^ l » Co sl O CO O O NO O NO NO CO fc O O fc -' O O sJ θ O i ND O fc OO O O Co g O 00 —■ fc —■ ND fc fc cn ^ o o cn ca o ^ ND o 0 -" ^ ^ ^ ™ -0 ^ 0^ ^ ^ ^ ^ ^ ^ ^ ^ o — o vl O cn cπ o fc cn o -p* i
m O
Oi Oi Oi Ol Ol Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi ϋi Oi Oi Oi Oi Oi Oi Oi Oi ϋi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi
O Cn Oi Oi Oi Oi Oi Oi Oi Oi Oi ϋi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi Oi ϋi Oi ϋi Oi Oi Oi Oi Oi Oi Oi Oi ϋi Oi Oi Oi Oi Cn Oi Oi Oi ϋi Oi Oi Oi Oi O o
Figure imgf000250_0001
co co co co co co co co
O o fc fc fc fc — . o o O oo oo co fc fc fc fc o o o o — > o o o o o fc fc fc o fc O o o — . co fc o oo co fc fc O oo ° FT o fc fc fc fc o co to fc o o o o fc o fc fc o
TABLE 5
SEQ ID NO: Template ID Component IC Start Stop
56 LI: 1167059.4:2001 JAN 12 1274804F1 1781 2322
56 LI: 1167059.4:2001JAN12 1274804F6 1783 2384
56 LI: 1167059.4:2001 JAN 12 6956773R8 1907 2461
56 LI: 1167059.4:200 IJAN 12 1598831 HI 1939 2119
56 LI: 1167059.4:2001 JAN 12 909852R6 2484 2869
56 LI: 1167059.4:2001 JAN12 4172054F8 2513 2638
56 LI:1167059,4:2001JAN12 4172054H1 2513 2632
56 LI: 1167059.4:200 IJAN 12 5290168H1 2513 2637
56 LI: 1167059.4:200 IJAN 12 g2254483 2568 2639
56 U:1167059.4:2001JAN12 909852H1 2482 2651
56 LI: 1167059.4:2001 JAN 12 909913H1 2483 2737
56 LI:1167059.4:2001JAN12 g2569187 23352611
56 LI: 1167059.4:2001 JAN12 5688131H1 24062685
56 LI:1167059,4:2001JAN12 779951 HI 24282638
56 LI: 1167059.4:200 IJAN 12 3480033H1 2443 2633
56 LI: 1167059.4:2001 JAN 12 5688131 F6 2445 2628
56 LI: 1167059.4:2001 JAN 12 909852T6 2483 2715
56 LI: 1167059.4:2001 JAN 12 2074623F6 19942598
56 LI: 1167059.4:200 IJAN 12 4415757H1 2021 2272
56 LI: 1167059.4:2001 JAN 12 3945872T8 20422490
56 LI: 1167059.4:2001 JAN 12 1274804T6 19902460
56 LI: 1167059.4:2001 JAN 12 3096033H1 2111 2356
56 LI: 1167059.4:200 IJAN 12 2354687H1 2143 2365
56 LI: 1167059.4:200 IJAN 12 5153807H1 21452395
56 LI: 1167059.4:200 IJAN 12 6403753F8 21502638
56 LI: 1167059.4:2001 JAN12 4312207H1 2167 2455
56 LI: 1167059.4:200 U AN! 2 g4079119 2193 2638
56 LI: 1167059,4:200 IJAN 12 g6038165 22192638
56 Ll.-l 167059.4:200 IJAN 12 g4001840 22202641
56 LI: 1167059.4:200 IJAN 12 70286076V1 34 142
56 LI: 1167059.4:200 IJAN 12 1271773H1 34 269
56 LI:1167059,4:2001JAN12 70281003V1 34 567
56 LI: 1167059.4:2001 JAN12 70280328V1 34 503
56 LI: 1167059.4:2001 JAN 12 1271773T6 93 480
56 LI: 1167059.4:2001 JAN 12 70522769V1 328 686
56 LI: 1167059.4:2001 JAN 12 g7669975 610 2821
56 LI: 1167059.4:200 IJAN 12 411210H1 1146 1261
56 LI: 1167059.4:2001 JAN 12 412693H1 1146 1368
56 LI: 1167059.4:2001 JAN 12 g 1324449 1192 1595
56 LI: 1167059.4:200 IJAN 12 6280938H1 1359 1627
56 LI: 1167059.4:200 IJAN 12 6283790H1 1359 1592
56 LI: 1167059.4:200 IJAN 12 3329321 HI 1383 1673
56 LI: 1167059.4:2001 JAN 12 70522794V1 1387 2001
56 LI: 1167059.4:2001 JAN 12 2286345H1 1414 1651
56 LI: 1167059.4:2001 JAN 12 3552749H1 1432 1582
56 LI: 1167059.4:2001 JAN 12 2673881 HI 1436 1651
56 LI: 1167059,4:2001 JAN 12 4792347F6 1 527
56 LI: 1167059.4:2001 JAN 12 70282235V1 34 518
56 LI: 1167059.4:2001 JAN 12 70279436V1 34 512
56 LI:1167059.4:2001JAN12 2791254H1 33 153 TABLE 5
ID NO: Template ID Component IC Start Stop 6 LI: 1 167059,4:2001 JAN 12 70279978V1 34 612
56 LI:1 167059.4:2001JAN12 70278067V 1 34 524 6 LI: 1167059,4:2001 JAN 12 g7152964 2242 2638
56 LI: 1167059,4:200 IJAN 12 6059623F8 2255 2544 6 Ll-.l 167059.4:2001 JAN 12 g4534234 2273 2640
56 LI: 1 167059.4:2001 JAN 12 5307220H1 2276 2507
56 LI: 1 167059.4:2001 JAN 12 g4686416 2276 2622 6 Ll.-l 167059,4:200 IJAN 12 1923562T6 2333 2600
TABLE 6
SEQ ID NO: Template ID Tissue Distribution
1 LI: 1983416.1 :2001 JAN 12 Digestive System - 83%, Female Genitalia - 17%
2 LI:332263.1 :2001JAN12 Connective Tissue - 14%, Musculos eletal System - 14%, Male Genitalia - 12%
3 LI:333886.4:2001JAN12 Unclassified/Mixed - 79%
4 LI:478508.1 :2001JAN12 Germ Cells - 63%, Male Genitalia - 16%, Unclassified/Mixed - 10%, Nervous System - 10%
5 LI :307470.1 :200 IJAN 12 Nervous System - 55%, Female Genitalia - 27%, Hemic and Immune System - 18%
6 LI :058298.1 :200 IJAN 12 Urinary Tract - 75%, Male Genitalia - 25%
7 LI:205527.5:2001JAN12 Nervous System - 100%
8 LI:231587.1 :2001JAN12 Nervous System - 57%, Female Genitalia - 43%
9 LI:402919.1 :2001JAN12 Germ Cells - 45%, Nervous System - 33%
10 LI:463283.1 :2001JAN12 Female Genitalia - 50%, Respiratory System - 50%
1 1 LI:0725ό0.1 :2001JAN12 Female Genitalia - 58%, Nervous System - 16%, Endocrine System - 15%
12 LI:195309ό, l :2001JAN12 Respiratory System - 100%
1 13 LI:1076016.1 :2001JAN12 Sense Organs - 69%, Liver - 11%, Pancreas - 10%
14 Ll:2082796.1 :2001 JAN 12 Hemic and Immune System - 100% ro
(Jl 15 Ll:335681 , 3:2001 JAN12 Unclassified/Mixed - 30%, Exocrine Glands - 28%, Male Genitalia - 15%
NO
16 Ll:214150, 1 :2001 JAN12 Pancreas - 50%, Nervous System - 27%, Male Genitalia - 23%
17 Ll:322783.15:200 IJAN 12 Connective Tissue - 87%
1
18 LI:422993.1 :2001JAN12 Liver - 46%, Sense Organs - 40%
! 19 LI: 1 172885.1 :2001 JAN 12 Exocrine Glands - 48%, Male Genitalia - 14%, Digestive System - 14%, Nervous System - 14% o 2n0 i LI: 1088359.1 i :H2o0m01 JAN 12 Respiratory System - 16%, Embryonic Structures - 16%, Female Genitalia - 13%
21 LI:813422.1 :2001JAN12 Stomatognathic System - 33%, Sense Organs - 22%, Cardiovascular System - 15%
22 LI:1 186426.1 :2001JAN12 Musculoskeletal System - 22%, Digestive System - 15%, Endocrine System - 13%
23 LI:1 182817.1 :2001JAN12 Stomatognathic System - 20%, Sense Organs - 13%
24 LI:1 170153.9:2001JAN12 Female Genitalia - 88%, Male Genitalia - 13%
25 LI: 1 171553.1 :2001JAN12 Urinary Tract- 27%, Sense Organs - 19%, Embryonic Structures - 13%
26 Ll:2121978.1 :200 IJAN 12 Hemic and Immune System - 100%
27 LI:1 174292.5:2001JAN12 Urinary Tract- 10%
28 U:1179173.1 :2001JAN12 Respiratory System - 33%, Musculoskeletal System - 14%
29 Ll:2122025.1 :2001 JAN12 Unclassified/Mixed - 55%, Female Genitalia - 34%
30 LL2049224.1 :200 IJAN 12 Endocrine System - 30%, Cardiovascular System - 25%, Exocrine Glands - 25%
. 31 U:758541.1 :2001JAN12 Exocrine Glands - 51%, Unclassified/Mixed - 13%
32 LI: 137815.1 :200 IJAN 12 Skin - 31%, Cardiovascular System - 15%, Sense Organs - 13%
TABLE ό
SEQ ID NO: Template ID Tissue Distribution
33 Ll:335097.1 :2001 JAM 2 Embryonic Structures - 24%, Nervous System - 24%, Musculoskeletal System - 16%
34 LI:232059.2:2001JAN12 Urinary Tract - 51%, Endocrine System - 12%
35 Ll:400109,2:2001 JAN 12 Embryonic Structures - 17%, Germ Cells - 16%
36 Ll:329770.1 :2001 JAN 12 Liver - 90%
37 Ll:898841.9:2001 JAN12 Endocrine System - 89%, Nervous System - 1 1 %
38 LI: 1183848,3:2001 JAN12 Pancreas - 100%
39 LI:2037121.1 :2001JAN12 Female Genitalia - 51%, Unclassified/Mixed - 23%, Musculoskeletal System - 12%
40 LI;356090.1 :2001JAN12 Respiratory System - 91 %
41 LI:212142,1:2001JAN12 Urinary Tract - 64%, Digestive System - 19%, Musculoskeletal System - 10%
42 LI: 1096706.1 :200 IJAN 12 Nervous System - 100%
1 43 LI :012622.1 :200 IJAN 12 Unclassified/Mixed - 41%, Endocrine System - 17%, Pancreas - 16%
44 U:1171095,29:2001JAN12 Hemic and Immune System - 31%, Digestive System - 12%
! 45 LI:023813.1 :2001JAN12 Hemic and Immune System - 23%, Urinary Tract- 16%, Digestive System - 10%
46 U:229030.1:2001JAN12 Pancreas - 65%, Digestive System - 18%, Respiratory System - 18%
47 LI: 1072894.9:200 IJAN 12 Nervous System - 100%
48 U:2031263.1 :2001JAN12 Unclassified/Mixed - 54%, Germ Cells - 21%, Digestive System - 12%
49 LI:432285.3:2001JAN12 Stomatognathic System - 34%, Connective Tissue - 25%, Cardiovascular System - 13%
50 L1:1177772.30:2001JAN12 Sense Organs - 39%, Respiratory System - 11%
1
51 U:475420.2:2001JAN12 Sense Organs - 82%, Endocrine System - 15% 52 Ll:017599.3:2001 JAN12 Respiratory System - 50%, Cardiovascular System - 23%, Pancreas - 13%
53 LI:030502,2:2001JAN12 Germ Cells - 68%, Liver - 16%
54 LI:1181337.3:2001JAN12 Digestive System - 51%, Unclassified/Mixed - 21%, Female Genitalia - 13%, Male Genitalia - 13%
55 U:1164672.3:2001JAN12 Female Genitalia - 36%, Urinary Tract - 16%, Skin - 11%
56 LI:1167059.4:2001JAN12 Urinary Tract - 28%, Skin - 16%
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
57 1 83 1 249 g333512ό 3.00E-11 Cl-KFYI protein
57 1 83 1 249 g2909860 3.00E-11 NADH-ubiquinone oxidoreductase subunit Cl-KFYI
57 1 83 1 249 g12858549 2.00E-05 putative
59 2 144 191 622 g31207 9.00E-16 put.thyroid hormone receptor
59 2 144 191 622 g180253 9.00E-16 c-erbA
59 2 144 191 622 g4426901 1.00E-13 thyroid hormone receptor betal
60 3 169 246 752 g12840055 1.00E-29 putative
60 3 169 246 752 g12838464 2.00E-22 putative
60 3 169 246 752 g12858779 6.00E-08 putative
61 66 556 753 g16551806 8.00E-18 (AK056408) unnamed protein product
61 66 556 753 g16041132 4.00E-17 hypothetical protein
61 66 556 753 g7020292 1.00E-16 unnamed protein product
62 154 277 738 g36615 2.00E-57 serine/threonine protein kinase
62 154 277 738 g6580288 4.00E-49 contains similarity to Pfam domain: PF00069 (Eukaryotic protein kinase domain), Score=307,5, E-value=5.1 e-89, N=1 62 1 154 277 738 g7297009 2.00E-48 CG7236 gene product
64 2 77 179 409 g14043238 2.00E-09 Similar to hypothetical protein PRO! 722
64 2 77 179 409 g403460 4.00E-09 transformation-related protein
64 2 77 1 9 409 g11493483 5.00E-09 PRO2550
65 2 258 281 1054 gδl06978 1.00E-71 homeobox protein GBX2
65 2 258 281 1054 g755767 3.00E-70 Stra7
65 2 258 281 1054 g3676057 3.00E-70 gastrulation-brain-homeobox-2
68 3 61 39 221 g15341934 4.00E-05 Unknown (protein for MGC:8826)
70 1 109 259 585 g4323152 1.00E-30 Ets-protein Spi-C
71 3 276 333 1160 g15986451 3.00E-49 Kruppel-type zinc-finger protein ZIM3
71 3 276 333 1160 g16551859 5.00E-49 (AK056452) unnamed protein product
71 3 276 333 1160 g14456631 2.00E-47 dJ54B20.4 (novel KRAB box containing C2H2 type zinc finger protein)
74 1 205 547 1161 g10440136 4.00E-96 unnamed protein product
74 1 205 547 1161 g12846755 1.00E-94 putative
74 1 205 547 1161 g12840994 1.00E-94 putative
75 3 66 393 590 g16552067 7.00E-22 (AK056615) unnamed protein product
75 3 66 393 590 g16549180 4.00E-19 (AK054606) unnamed protein product
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
75 3 66 393 590 g 10437767 1.00E-18 unnamed protein product
76 144 432 g55483 2.00E-44 Zfp-1 protein (AA 1-424)
76 144 432 g387162 2.00E-44 finger protein (put.); putative
76 144 432 g5757626 5.00E-26 C2H2 zinc finger protein
78 263 789 g506502 1.00E-141 NK10
78 263 789 g 16552044 1.00E-138 (AK056596) unnamed protein product
78 263 789 g488555 1.00E-92 zinc finger protein ZNF135
79 3 884 3 2654 g498152 0 ha094ό protein is Kruppel-related.
79 3 884 3 2654 gl 1 181880 0 bAl 021 Ol 9.1 (zinc finger protein 33a (KOX 31))
79 3 884 3 2654 g938238 0 ZNF1 1 B
81 3 256 1 101 1868 g 16549180 ό.OOE-40 (AK054606) unnamed protein product 81 3 256 1 101 1868 g 14250716 2.00E-28 Unknown (protein for MGC:13310)
81 3 256 1 101 1868 g2224593 3.00E-26 KIAA0326
83 2 566 134 1831 g2384653 0 Krueppel family zinc finger protein 83 2 566 134 1831 g4559318 0 BC273239J
83 2 566 134 1831 g 16306806 0 (BC006528) zinc finger protein 43 (HTF6)
84 2 520 98 1657 go 1 18383 0 zinc finger protein ZNF223
84 2 520 98 1657 g 10835284 0 Zinc finger protein ZNF223 (amino acids 82-482)
84 2 520 98 1657 g6598826 0 ZNF230-like
85 3 233 513 121 1 g2689444 6.00E-76 ZNF134
85 3 233 513 121 1 g 1655281 1 2.00E-69 (AK057209) unnamed protein product
85 3 233 513 121 1 g488553 8.00E-69 zinc finger protein ZNF134
86 1 141 22 444 g7023216 3.00E-55 unnamed protein product
86 1 141 22 444 g7023703 6.00E-30 unnamed protein product
86 1 141 22 |/]/| g 16878329 9.00E-22 (BC017357) Unknown (protein for MGC:29628)
87 3 1 12 42 377 g5679576 2.00E-62 zinc finger 41
87 3 1 12 42 377 g340444 2.00E-62 zinc finger protein 41
87 3 1 12 42 377 g 15787775 2.00E-62 bB479Fl 7.3 (zinc finger protein 41)
88 3 839 393 2909 g6409379 0 zinc finger protein ZNF229
88 3 839 393 2909 gl 0864174 0 ZNF229 (amino acids 1-420)
88 3 839 393 2909 g6984172 0 zinc finger protein ZNF226
91 1 77 370 600 g 1389766 3.00E-13 unknown
TABLE 7
SEQ ID NO: Frame Length Start Stop Gl Number Probability Score Annotation
91 77 370 600 g6855613 5.00E-13 PRO0974
91 77 370 600 g9929935 1.00E-09 hypothetical protein
92 70 505 714 g 12698182 1.00E-19 hypothetical protein
92 70 505 714 g 10439739 9.00E-19 unnamed protein product
92 70 505 714 g 14388331 6.00E-18 hypothetical protein
97 61 115 297 gl871540 8.00E-06 plakophilin 2b
97 61 1 15 297 g 13623263 2.00E-05 Similar to inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta
99 197 400 990 g3724141 2.00E-61 myosin 1
99 197 400 990 g7297714 ό.OOE-37 Myo31 DF gene product
99 197 400 990 g466256 6.00E-37 myosin-IA
102 3 88 171 434 g 10437485 4.00E-21 unnamed protein product
102 3 88 171 434 g 10437569 1.00E-20 unnamed protein product 102 3 88 171 434 g 10441877 6.00E-19 unknown 103 3 51 9 161 g8926741 4.00E-08 hypothetical protein
103 3 51 9 161 g8926693 4.00E-08 hypothetical protein
103 3 51 9 161 gό841518 4.00E-08 HSPC148
105 3 105 939 1253 g 16877350 7.00E-09 (BC016928) ornithine aminotransferase (gyrate atrophy)
105 3 105 939 1253 g386987 7.00E-09 ornithine aminotransferase
105 3 105 939 1253 g34138 7.00E-09 precursor (AA -35 to 404)
108 3 401 954 2156 g386835 0 interstitial retinol-binding protein precursor
108 3 401 954 2156 g307075 0 retinol-binding protein
108 3 401 954 2156 g307074 0 IRBP precursor i n 3 183 6 554 gl 196425 5.00E-32 envelope protein
1 1 1 3 183 6 554 g323899 2.00E-06 envelope protein precursor
1 1 1 3 183 6 554 g 10946419 2.00E-06 envelope protein
1 12 2 106 3068 3385 g6688950 7.00E-29 viral polyprotein
1 12 2 106 3068 3385 g6688948 7.00E-29 viral polypeptide
112 2 106 3068 3385 gόό8894ό 7.00E-29 viral polypeptide
1 13 3 370 51 1 160 g7688657 0 septin 10
113 3 370 51 1160 g 10432915 0 unnamed protein product
1 13 3 370 51 1 160 g7023141 1.00E-146 unnamed protein product
TABLE 8
Program Description Reference Parameter Threshold ABIFACTURA A program that removes vector sequences and masks Applied Biosystems, Foster City, CA. ambiguous bases in nucleic acid sequences.
ABItPARACEL FDF A Fast Data Finder useful in comparing and annotating Applied Biosystems, Foster City, CA; Paracel Mismatch <50% amino acid or nucleic acid sequences. Inc., Pasadena, CA.
ABI AutoAssembler A program that assembles nucleic acid sequences. Applied Biosystems, Foster City, CA. BLAST A Basic Local Alignment Search Tool useful in sequence Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403- ESTs: Probability value= l.OE-8 or less; similarity search for amino acid and nucleic acid 410; Altschul, S.F. et al. (1997) Nucleic Acids Full Length sequences: Probability value sequences. BLAST includes five functions: blastp, Res. 25:3389-3402. l.OE-10 or less blastn, blastx, tblastn, and tblastx.
FASTA A Pearson and Lipman algorithm that searches for Pearson, W.R. and DJ. Lipman (1988) Proc. ESTs: fasta E value=1.06E-6; Assembled similarity between a query sequence and a group of Natl. Acad Sci. USA 85:2444-2448; Pearson, ESTs: fasta Identity- 95% or greater and sequences of the same type. FASTA comprises as least W.R. (1990) Methods Enzymol. 183:63-98; and Match length=200 bases or greater; fastx five functions: fasta, tfasta, fastx, tfastx, and ssearch. Smith, T.F. and M.S. Waterman (1981) Adv. value=1.0E-8 or less; Full Length
Appl. Math. 2:482-489. sequences: fastx score=100 or greater
BLIMPS A BLocks IMProved Searcher that matches a sequence Henikoff, S. and J.G. Henikoff (1991) Nucleic Probability value= 1.0E-3 or less against those in BLOCKS, PRINTS, DOMO, PRODOM, Acids Res. 19:6565-6572; Henikoff, J.G. and S. and PFAM databases to search for gene families, Henikoff (1996) Methods Enzymol. 266:88-105; sequence homology, and structural fingerprint regions, and Attwood, T.K. et al. (1997) J. Chem. Inf.
Comput. Sci. 37:417-424.
HMMER An algorithm for searching a query sequence against Krogh, A. et al. (1994) J. Mol. Biol. 235: 1501- PFAM hits: Probability value= 1.0E-3 or hidden Markov model (HMM)-based databases of 1531; Sonnhammer, E.L.L. et al. (1988) Nucleic less; protein family consensus sequences, such as PFAM. Acids Res. 26:320-322; Durbin, R. et al. (1998) Signal peptide hits: Score= 0 or greater
Our World View, in a Nutshell, Cambridge
Univ. Press, pp. 1-350.
ProfileScan An algorithm that searches for structural and sequence Gribskov, M. et al. (1988) CABIOS 4:61-66; Normalized quality score≥GCG-specifie motifs in protein sequences that match sequence patterns Gribskov, M. et al. (1989) Methods Enzymol. "HIGH" value for that particular Prosite defined in Prosite. 183: 146-159; Bairoch, A. et al. (1997) Nucleic motif. Generally, score=1.4-2.1.
Acids Res. 25:217-221.
TABLE 8
Program Description Reference Parameter Threshold Phred A base-calling algorithm that examines automated Ewing, B. et al. (1998) Genome Res. 8:175-185; sequencer traces with high sensitivity and probability. Ewing, B. and P. Green (1998) Genome Res. 8:186-194.
Phrap A Phils Revised Assembly Program including SWAT Smith, T.F. and M.S. Waterman (1981) Adv. Score= 120 or greater; and CrossMatch, programs based on efficient Appl. Math. 2:482-489; Smith, T.F. and M.S. Match length= 56 or greater implementation of the Smith- Waterman algorithm, Waterman (1981) J. Mol. Biol. 147:195-197; useful in searching sequence homology and assembling and Green, P., University of Washington, DNA sequences. Seattle, WA.
Consed A graphical tool for viewing and editing Phrap Gordon, D. et al. (1998) Genome Res. 8:195- assemblies. 202.
, SPScan A weight matrix analysis program that scans protein Nielson, H. et al. (1997) Protein Engineering Score=3.5 or greater sequences for the presence of secretory signal peptides. 10:1-6; Claverie, J.M. and S. Audic (1997) to CABIOS 12:431-439.
OO
TMAP A program that uses weight matrices to delineate Persson, B. and P. Argos (1994) J. Mol. Biol. transmembrane segments on protein sequences and 237:182-192; Persson, B. and P. Argos (1996) determine orientation. Protein Sci. 5:363-371.
TMHMMER A program that uses a hidden Markov model (HMM) to Sonnhammer, E.L. et al. (1998) Proc. Sixth Intl. delineate transmembrane segments on protein sequences Conf. On Intelligent Systems for Mol. Biol., and determine orientation. Glasgow et al., eds., The Am. Assoc. for Artificial Intelligence (AAAI) Press, Menlo Park, CA, and MIT Press, Cambridge, MA, pp. 175-182.
Motifs A program that searches amino acid sequences for Bairoch, A. et al. (1997) Nucleic Acids Res. patterns that matched those defined in Prosite. 25:217-221; Wisconsin Package Program Manual, version 9, page M51-59, Genetics Computer Group, Madison, WI.

Claims

CLAIMSWhat is claimed is:
1. An isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of: a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56, b) a naturally occurring polynucleotide sequence at least 90% identical to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l-56, c) a polynucleotide sequence complementary to a), d) a polynucleotide sequence complementary to b), and e) an RNA equivalent of a) through d).
2. An isolated polynucleotide of claim 1, comprising a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-56.
3. An isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide of claim 1.
4. A composition for the detection of expression of diagnostic and therapeutic polynucleotides comprising at least one of the polynucleotides of claim 1 and a detectable label.
5. A method for detecting a target polynucleotide in a sample, said target polynucleotide having a sequence of a polynucleotide of claim 1, the method comprising: a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction amplification, and b) detecting the presence or absence of said amplified target polynucleotide or fragment thereof, and, optionally, if present, the amount thereof.
6. A method for detecting a target polynucleotide in a sample, said target polynucleotide comprising a sequence of a polynucleotide of claim 1, the method comprising: a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides comprising a sequence complementary to said target polynucleotide in the sample, and which probe specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization complex is formed between said probe and said target polynucleotide or fragments thereof, and b) detecting the presence or absence of said hybridization complex, and, optionally, if present, the amount thereof.
7. A method of claim 5, wherein the probe comprises at least 30 contiguous nucleotides.
5 8. A method of claim 5, wherein the probe comprises at least 60 contiguous nucleotides.
9. A recombinant polynucleotide comprising a promoter sequence operably linked to a polynucleotide of claim 1.
0 10. A cell transformed with a recombinant polynucleotide of claim 9.
11. A transgenic organism comprising a recombinant polynucleotide of claim 9.
12. A method for producing a diagnostic and therapeutic polypeptide, the method comprising: 5 a) culturing a cell under conditions suitable for expression of the diagnostic and therapeutic polypeptide, wherein said cell is transformed with a recombinant polynucleotide of claim 9, and b) recovering the diagnostic and therapeutic polypeptide so expressed.
13. A purified diagnostic and therapeutic polypeptide (DITHP) encoded by at least one of the o polynucleotides of claim 2.
14. An isolated antibody which specifically binds to a diagnostic and therapeutic polypeptide of claim 13.
5 15. A method of identifying a test compound which specifically binds to the diagnostic and therapeutic polypeptide of claim 13, the method comprising the steps of: a) providing a test compound; b) combining the diagnostic and therapeutic polypeptide with the test compound for a sufficient time and under suitable conditions for binding; and 0 c) detecting binding of the diagnostic and therapeutic polypeptide to the test compound, thereby identifying the test compound which specifically binds the diagnostic and therapeutic polypeptide.
16. A microarray wherein at least one element of the microarray is a polynucleotide of claim 5 3.
17. A method for generat ng a transc pt mage of a sample wh ch conta ns polynucleotides, the method comprising the steps of: a) labeling the polynucleotides of the sample, b) contacting the elements of the microarray of claim 16 with the labeled polynucleotides of the sample under conditions suitable for the formation of a hybridization complex, and c) quantifying the expression of the polynucleotides in the sample.
18. A method for screening a compound for effectiveness in altering expression of a target polynucleotide, wherein said target polynucleotide comprises a polynucleotide sequence of claim 1, the method comprising: a) exposing a sample comprising the target polynucleotide to a compound, under conditions suitable for the expression of the target polynucleotide, b) detecting altered expression of the target polynucleotide, and c) comparing the expression of the target polynucleotide in the presence of varying amounts of the compound and in the absence of the compound.
19. A method for assessing toxicity of a test compound, said method comprising: a) treating a biological sample containing nucleic acids with the test compound; b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at least 20 contiguous nucleotides of a polynucleotide of claim 1 under conditions whereby a specific hybridization complex is formed between said probe and a target polynucleotide in the biological sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide of claim 1 or fragment thereof; c) quantifying the amount of hybridization complex; and d) comparing the amount of hybridization complex in the treated biological sample with the amount of hybridization complex in an untreated biological sample, wherein a difference in the amount of hybridization complex in the treated biological sample is indicative of toxicity of the test compound.
20. An array comprising different nucleotide molecules affixed in distinct physical locations on a solid substrate, wherein at least one of said nucleotide molecules comprises a first oligonucleotide or polynucleotide sequence specifically hybridizable with at least 30 contiguous nucleotides of a target polynucleotide, said target polynucleotide having a sequence of claim 1.
21. An array of claim 20, wherein said first oligonucleotide or polynucleotide sequence is completely complementary to at least 30 contiguous nucleotides of said target polynucleotide.
22. An array o c am , w erein sai rst o gonuc eot e or po ynuc eot e sequence s completely complementary to at least 60 contiguous nucleotides of said target polynucleotide
23. An array of claim 20, wliich is a microarray.
24. An array of claim 20, further comprising said target polynucleotide hybridized to said first oligonucleotide or polynucleotide.
25. An array of claim 20, wherein a linker joins at least one of said nucleotide molecules to said solid substrate.
26. An array of claim 20, wherein each distinct physical location on the substrate contains multiple nucleotide molecules having the same sequence, and each distinct physical location on the substrate contains nucleotide molecules having a sequence which differs from the sequence of nucleotide molecules at another physical location on the substrate.
27. An isolated polypeptide comprising an amino acid sequence selected from the group consisting of: a) an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, b) a naturally occurring amino acid sequence at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, c) a biologically active fragment of an amino acid sequence selected from the group consisting of SEQ ID NO:57-113, and d) an immunogenic fragment of an amino acid sequence selected from the group consisting of SEQ _D NO:57-113.
28. An isolated polypeptide of claim 27, comprising a polypeptide sequence selected from the group consisting of SEQ ID NO:57-113.
PCT/US2002/001009 2001-01-12 2002-01-09 Molecules for diagnostics and therapeutics WO2002079473A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP02733781A EP1366166A2 (en) 2001-01-12 2002-01-09 Molecules for diagnostics and therapeutics
US10/250,889 US20040115629A1 (en) 2002-01-09 2002-01-09 Molecules for diagnostics and therapeutics
CA002434677A CA2434677A1 (en) 2001-01-12 2002-01-09 Molecules for diagnostics and therapeutics

Applications Claiming Priority (38)

Application Number Priority Date Filing Date Title
US26162201P 2001-01-12 2001-01-12
US60/261,622 2001-01-12
US26186401P 2001-01-16 2001-01-16
US26186501P 2001-01-16 2001-01-16
US60/261,865 2001-01-16
US60/261,864 2001-01-16
US26220701P 2001-01-17 2001-01-17
US26221501P 2001-01-17 2001-01-17
US26220801P 2001-01-17 2001-01-17
US26216401P 2001-01-17 2001-01-17
US26220901P 2001-01-17 2001-01-17
US60/262,208 2001-01-17
US60/262,207 2001-01-17
US60/262,215 2001-01-17
US60/262,209 2001-01-17
US60/262,164 2001-01-17
US26310201P 2001-01-18 2001-01-18
US60/263,102 2001-01-18
US26306901P 2001-01-19 2001-01-19
US26306301P 2001-01-19 2001-01-19
US26259901P 2001-01-19 2001-01-19
US26306401P 2001-01-19 2001-01-19
US26266201P 2001-01-19 2001-01-19
US26332901P 2001-01-19 2001-01-19
US26307701P 2001-01-19 2001-01-19
US26276001P 2001-01-19 2001-01-19
US26306501P 2001-01-19 2001-01-19
US26333001P 2001-01-19 2001-01-19
US60/263,063 2001-01-19
US60/263,064 2001-01-19
US60/263,065 2001-01-19
US60/263,069 2001-01-19
US60/263,330 2001-01-19
US60/262,599 2001-01-19
US60/263,077 2001-01-19
US60/263,329 2001-01-19
US60/262,662 2001-01-19
US60/262,760 2001-01-19

Publications (2)

Publication Number Publication Date
WO2002079473A2 true WO2002079473A2 (en) 2002-10-10
WO2002079473A3 WO2002079473A3 (en) 2003-10-02

Family

ID=27586320

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/001009 WO2002079473A2 (en) 2001-01-12 2002-01-09 Molecules for diagnostics and therapeutics

Country Status (3)

Country Link
EP (1) EP1366166A2 (en)
CA (1) CA2434677A1 (en)
WO (1) WO2002079473A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004105782A3 (en) * 2003-05-29 2005-04-21 Gaslini Children S Hospital G Drug delivery systems for tumor targeting ngr-molecules and uses thereof
FR2984364A1 (en) * 2011-12-20 2013-06-21 Biomerieux Sa METHOD FOR IN VITRO DIAGNOSIS OR PROGNOSIS OF OVARIAN CANCER

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999036525A1 (en) * 1998-01-19 1999-07-22 Shanghai Second Medical University Cbfblh12: a gene highly related to bovine ci-kfyi gene for ubiquinone oxireductase complex
JP2002513765A (en) * 1998-05-07 2002-05-14 ザ リージェンツ オブ ザ ユニバーシティ オブ カリフォルニア Use of neglected target tissue antigens in the modulation of the immune response
FR2797402B1 (en) * 1999-07-15 2004-03-12 Biomerieux Stelhys USE OF A POLYPEPTIDE FOR DETECTING, PREVENTING OR TREATING A CONDITION ASSOCIATED WITH A DEGENERATIVE, NEUROLOGICAL OR AUTOIMMUNE DISEASE
US6436703B1 (en) * 2000-03-31 2002-08-20 Hyseq, Inc. Nucleic acids and polypeptides

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004105782A3 (en) * 2003-05-29 2005-04-21 Gaslini Children S Hospital G Drug delivery systems for tumor targeting ngr-molecules and uses thereof
US7479483B2 (en) 2003-05-29 2009-01-20 G. Gaslini Children's Hospital Tumor-targeted drug delivery systems and uses thereof
FR2984364A1 (en) * 2011-12-20 2013-06-21 Biomerieux Sa METHOD FOR IN VITRO DIAGNOSIS OR PROGNOSIS OF OVARIAN CANCER
WO2013093347A3 (en) * 2011-12-20 2013-10-24 bioMérieux A method for the in vitro diagnosis or prognosis of ovarian cancer
CN104169434A (en) * 2011-12-20 2014-11-26 拜奥默里克斯公司 Method for in vitro diagnosis or prognosis of ovarian cancer
CN110541031A (en) * 2011-12-20 2019-12-06 拜奥默里克斯公司 Method for in vitro diagnosis or prognosis of ovarian cancer
US11453920B2 (en) 2011-12-20 2022-09-27 Biomerieux Method for the in vitro diagnosis or prognosis of ovarian cancer

Also Published As

Publication number Publication date
EP1366166A2 (en) 2003-12-03
WO2002079473A3 (en) 2003-10-02
CA2434677A1 (en) 2002-10-10

Similar Documents

Publication Publication Date Title
US20040115629A1 (en) Molecules for diagnostics and therapeutics
WO2004023973A2 (en) Molecules for diagnostics and therapeutics
CA2442705A1 (en) Molecules for diagnostics and therapeutics
JP2004528003A (en) Extracellular matrix and cell adhesion molecules
US20040014087A1 (en) Molecules for diagnostics and therapeutics
EP1268758A2 (en) Molecules for diagnostics and therapeutics
US20040048253A1 (en) Molecules for diagnostics and therapeutics
JP2003529325A (en) Human transport protein
JP2004500114A (en) Transcription factor
WO2003062376A2 (en) Molecules for diagnostics and therapeutics
WO2001062927A2 (en) Polypeptides and corresponding polynucleotides for diagnostics and therapeutics
JP2003532419A (en) Cytoskeletal binding protein
WO2003054219A2 (en) Nucleic acid-associated proteins
JP2005503751A (en) Extracellular matrix and cell adhesion molecules
WO2002020754A2 (en) Molecules for diagnostics and therapeutics
EP1265998A2 (en) Polypeptides and corresponding polynucleotides for diagnostics and therapeutics
WO2002079473A2 (en) Molecules for diagnostics and therapeutics
JP2005508636A (en) Nucleic acid binding protein
JP2004511208A (en) RNA metabolism protein
WO2003062385A2 (en) Secretory molecules
JP2004528002A (en) Secretory and transport molecules
WO2001021836A2 (en) Molecules for diagnostics and therapeutics
US20040044184A1 (en) Cytoskeleton-associated proteins
JP2004509610A (en) Nuclear hormone receptor
JP2004500813A (en) Vesicle transport protein

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2434677

Country of ref document: CA

Ref document number: 10250889

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2002733781

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 2002733781

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2002733781

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP