EP0941366A2 - Biallelic markers - Google Patents
Biallelic markersInfo
- Publication number
- EP0941366A2 EP0941366A2 EP97946582A EP97946582A EP0941366A2 EP 0941366 A2 EP0941366 A2 EP 0941366A2 EP 97946582 A EP97946582 A EP 97946582A EP 97946582 A EP97946582 A EP 97946582A EP 0941366 A2 EP0941366 A2 EP 0941366A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- polymorphic
- segment
- allele
- column
- nucleic acid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 239000000523 sample Substances 0.000 claims abstract description 59
- 108700028369 Alleles Proteins 0.000 claims abstract description 45
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 35
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 32
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 32
- 238000012360 testing method Methods 0.000 claims abstract description 8
- 239000012634 fragment Substances 0.000 claims description 26
- 108020004414 DNA Proteins 0.000 claims description 23
- 108091034117 Oligonucleotide Proteins 0.000 claims description 22
- 238000000034 method Methods 0.000 claims description 22
- 201000010099 disease Diseases 0.000 claims description 14
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 14
- 230000000295 complement effect Effects 0.000 claims description 12
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 6
- 239000003814 drug Substances 0.000 abstract description 4
- 238000012252 genetic analysis Methods 0.000 abstract description 2
- 239000013615 primer Substances 0.000 abstract 1
- 239000002987 primer (paints) Substances 0.000 abstract 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 55
- 108090000623 proteins and genes Proteins 0.000 description 38
- 239000002773 nucleotide Substances 0.000 description 26
- 125000003729 nucleotide group Chemical group 0.000 description 25
- 238000004458 analytical method Methods 0.000 description 20
- 239000000047 product Substances 0.000 description 16
- 230000000875 corresponding effect Effects 0.000 description 15
- 238000009396 hybridization Methods 0.000 description 14
- 230000003321 amplification Effects 0.000 description 13
- 238000003199 nucleic acid amplification method Methods 0.000 description 13
- 210000004027 cell Anatomy 0.000 description 12
- 230000002068 genetic effect Effects 0.000 description 12
- 102000004169 proteins and genes Human genes 0.000 description 12
- 241001465754 Metazoa Species 0.000 description 11
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 9
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 8
- 238000004519 manufacturing process Methods 0.000 description 8
- 241000283690 Bos taurus Species 0.000 description 7
- 238000003491 array Methods 0.000 description 7
- 102000004196 processed proteins & peptides Human genes 0.000 description 7
- 108090000765 processed proteins & peptides Proteins 0.000 description 7
- KDCGOANMDULRCW-UHFFFAOYSA-N 7H-purine Chemical compound N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 description 6
- 230000001186 cumulative effect Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- 230000035772 mutation Effects 0.000 description 6
- 229920001184 polypeptide Polymers 0.000 description 6
- 241000894007 species Species 0.000 description 6
- 239000000758 substrate Substances 0.000 description 6
- 238000011282 treatment Methods 0.000 description 6
- 230000007717 exclusion Effects 0.000 description 5
- 239000003550 marker Substances 0.000 description 5
- 210000004080 milk Anatomy 0.000 description 5
- 239000008267 milk Substances 0.000 description 5
- 235000013336 milk Nutrition 0.000 description 5
- 230000006798 recombination Effects 0.000 description 5
- 238000005215 recombination Methods 0.000 description 5
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 4
- 108091092878 Microsatellite Proteins 0.000 description 4
- 206010028980 Neoplasm Diseases 0.000 description 4
- 238000003556 assay Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 230000001488 breeding effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000003780 insertion Methods 0.000 description 4
- 230000037431 insertion Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 239000000463 material Substances 0.000 description 4
- 238000013518 transcription Methods 0.000 description 4
- 230000035897 transcription Effects 0.000 description 4
- 108020005196 Mitochondrial DNA Proteins 0.000 description 3
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 3
- 150000001413 amino acids Chemical group 0.000 description 3
- 210000004369 blood Anatomy 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 238000009395 breeding Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 230000002950 deficient Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 230000010076 replication Effects 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 210000001519 tissue Anatomy 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- 208000023275 Autoimmune disease Diseases 0.000 description 2
- 108091026890 Coding region Proteins 0.000 description 2
- 108020004705 Codon Proteins 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 2
- 230000004544 DNA amplification Effects 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 208000026350 Inborn Genetic disease Diseases 0.000 description 2
- 206010061218 Inflammation Diseases 0.000 description 2
- 102100034343 Integrase Human genes 0.000 description 2
- 241000699666 Mus <mouse, genus> Species 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 2
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 108700019146 Transgenes Proteins 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 210000003850 cellular structure Anatomy 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000002759 chromosomal effect Effects 0.000 description 2
- 210000000349 chromosome Anatomy 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 238000003935 denaturing gradient gel electrophoresis Methods 0.000 description 2
- 230000001627 detrimental effect Effects 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000003623 enhancer Substances 0.000 description 2
- 239000013604 expression vector Substances 0.000 description 2
- 208000016361 genetic disease Diseases 0.000 description 2
- 102000054766 genetic haplotypes Human genes 0.000 description 2
- 208000019622 heart disease Diseases 0.000 description 2
- 208000015181 infectious disease Diseases 0.000 description 2
- 230000004054 inflammatory process Effects 0.000 description 2
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 description 2
- 231100000518 lethal Toxicity 0.000 description 2
- 230000001665 lethal effect Effects 0.000 description 2
- 239000003446 ligand Substances 0.000 description 2
- 238000007834 ligase chain reaction Methods 0.000 description 2
- 210000004185 liver Anatomy 0.000 description 2
- 210000004962 mammalian cell Anatomy 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000007935 neutral effect Effects 0.000 description 2
- 239000002777 nucleoside Substances 0.000 description 2
- -1 nucleoside triphosphates Chemical class 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000001742 protein purification Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 210000003491 skin Anatomy 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 230000001225 therapeutic effect Effects 0.000 description 2
- 239000001226 triphosphate Substances 0.000 description 2
- 235000011178 triphosphate Nutrition 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- AWXGSYPUMWKTBR-UHFFFAOYSA-N 4-carbazol-9-yl-n,n-bis(4-carbazol-9-ylphenyl)aniline Chemical compound C12=CC=CC=C2C2=CC=CC=C2N1C1=CC=C(N(C=2C=CC(=CC=2)N2C3=CC=CC=C3C3=CC=CC=C32)C=2C=CC(=CC=2)N2C3=CC=CC=C3C3=CC=CC=C32)C=C1 AWXGSYPUMWKTBR-UHFFFAOYSA-N 0.000 description 1
- 102100032533 ADP/ATP translocase 1 Human genes 0.000 description 1
- 208000005452 Acute intermittent porphyria Diseases 0.000 description 1
- 201000004384 Alopecia Diseases 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 108091028026 C-DNA Proteins 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 201000003883 Cystic fibrosis Diseases 0.000 description 1
- 102000002004 Cytochrome P-450 Enzyme System Human genes 0.000 description 1
- 108010015742 Cytochrome P-450 Enzyme System Proteins 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 208000002197 Ehlers-Danlos syndrome Diseases 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 101000768061 Escherichia phage P1 Antirepressor protein 1 Proteins 0.000 description 1
- 108091060211 Expressed sequence tag Proteins 0.000 description 1
- 208000024720 Fabry Disease Diseases 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 241001200922 Gagata Species 0.000 description 1
- 102100040870 Glycine amidinotransferase, mitochondrial Human genes 0.000 description 1
- 208000003807 Graves Disease Diseases 0.000 description 1
- 208000015023 Graves' disease Diseases 0.000 description 1
- 208000031953 Hereditary hemorrhagic telangiectasia Diseases 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 101000796932 Homo sapiens ADP/ATP translocase 1 Proteins 0.000 description 1
- 101000893303 Homo sapiens Glycine amidinotransferase, mitochondrial Proteins 0.000 description 1
- 101000837344 Homo sapiens T-cell leukemia translocation-altered gene protein Proteins 0.000 description 1
- 208000000563 Hyperlipoproteinemia Type II Diseases 0.000 description 1
- 102000004877 Insulin Human genes 0.000 description 1
- 108090001061 Insulin Proteins 0.000 description 1
- 208000009625 Lesch-Nyhan syndrome Diseases 0.000 description 1
- 102100024640 Low-density lipoprotein receptor Human genes 0.000 description 1
- 108091005461 Nucleic proteins Proteins 0.000 description 1
- 208000008589 Obesity Diseases 0.000 description 1
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 1
- 206010031243 Osteogenesis imperfecta Diseases 0.000 description 1
- 108091093037 Peptide nucleic acid Proteins 0.000 description 1
- 241000042032 Petrocephalus catostoma Species 0.000 description 1
- 206010036182 Porphyria acute Diseases 0.000 description 1
- 102100029812 Protein S100-A12 Human genes 0.000 description 1
- 101710110949 Protein S100-A12 Proteins 0.000 description 1
- 108020004511 Recombinant DNA Proteins 0.000 description 1
- 241000283984 Rodentia Species 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 241000269319 Squalius cephalus Species 0.000 description 1
- 208000037065 Subacute sclerosing leukoencephalitis Diseases 0.000 description 1
- 206010042297 Subacute sclerosing panencephalitis Diseases 0.000 description 1
- 102100028692 T-cell leukemia translocation-altered gene protein Human genes 0.000 description 1
- 241001255830 Thema Species 0.000 description 1
- 208000035317 Total hypoxanthine-guanine phosphoribosyl transferase deficiency Diseases 0.000 description 1
- 208000026911 Tuberous sclerosis complex Diseases 0.000 description 1
- 206010045261 Type IIa hyperlipidaemia Diseases 0.000 description 1
- 102100026383 Vasopressin-neurophysin 2-copeptin Human genes 0.000 description 1
- 208000027276 Von Willebrand disease Diseases 0.000 description 1
- 208000006110 Wiskott-Aldrich syndrome Diseases 0.000 description 1
- YRKCREAYFQTBPV-UHFFFAOYSA-N acetylacetone Chemical compound CC(=O)CC(C)=O YRKCREAYFQTBPV-UHFFFAOYSA-N 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 150000007513 acids Chemical class 0.000 description 1
- 239000004480 active ingredient Substances 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 238000006664 bond formation reaction Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 210000003169 central nervous system Anatomy 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 230000000112 colonic effect Effects 0.000 description 1
- 238000004440 column chromatography Methods 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 108091036078 conserved sequence Proteins 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000000356 contaminant Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 239000000287 crude extract Substances 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000368 destabilizing effect Effects 0.000 description 1
- 201000010064 diabetes insipidus Diseases 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000007877 drug screening Methods 0.000 description 1
- 238000004520 electroporation Methods 0.000 description 1
- 210000001671 embryonic stem cell Anatomy 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 210000003743 erythrocyte Anatomy 0.000 description 1
- 210000003238 esophagus Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 201000001386 familial hypercholesterolemia Diseases 0.000 description 1
- 230000002550 fecal effect Effects 0.000 description 1
- 230000035558 fertility Effects 0.000 description 1
- 230000004720 fertilization Effects 0.000 description 1
- 238000004374 forensic analysis Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000002414 glycolytic effect Effects 0.000 description 1
- 230000013595 glycosylation Effects 0.000 description 1
- 238000006206 glycosylation reaction Methods 0.000 description 1
- 230000003676 hair loss Effects 0.000 description 1
- 244000144980 herd Species 0.000 description 1
- 208000009601 hereditary spherocytosis Diseases 0.000 description 1
- 238000004128 high performance liquid chromatography Methods 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 229940125396 insulin Drugs 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 230000006651 lactation Effects 0.000 description 1
- 208000032839 leukemia Diseases 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 210000001161 mammalian embryo Anatomy 0.000 description 1
- 230000021121 meiosis Effects 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 244000000010 microbial pathogen Species 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 230000002438 mitochondrial effect Effects 0.000 description 1
- 230000037230 mobility Effects 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 201000006417 multiple sclerosis Diseases 0.000 description 1
- 201000006938 muscular dystrophy Diseases 0.000 description 1
- 210000000653 nervous system Anatomy 0.000 description 1
- 238000003499 nucleic acid array Methods 0.000 description 1
- 210000004940 nucleus Anatomy 0.000 description 1
- 235000020824 obesity Nutrition 0.000 description 1
- 239000002751 oligonucleotide probe Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 210000000496 pancreas Anatomy 0.000 description 1
- 239000008194 pharmaceutical composition Substances 0.000 description 1
- 208000030761 polycystic kidney disease Diseases 0.000 description 1
- 238000003752 polymerase chain reaction Methods 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 208000015768 polyposis Diseases 0.000 description 1
- 230000004481 post-translational protein modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011321 prophylaxis Methods 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 230000001850 reproductive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 238000010839 reverse transcription Methods 0.000 description 1
- 206010039073 rheumatoid arthritis Diseases 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000005204 segregation Methods 0.000 description 1
- 210000000582 semen Anatomy 0.000 description 1
- 208000007056 sickle cell anemia Diseases 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 208000037369 susceptibility to malaria Diseases 0.000 description 1
- 210000004243 sweat Anatomy 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 201000000596 systemic lupus erythematosus Diseases 0.000 description 1
- 210000001138 tear Anatomy 0.000 description 1
- 238000010361 transduction Methods 0.000 description 1
- 230000026683 transduction Effects 0.000 description 1
- 238000001890 transfection Methods 0.000 description 1
- 230000009261 transgenic effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 208000009999 tuberous sclerosis Diseases 0.000 description 1
- 230000034512 ubiquitination Effects 0.000 description 1
- 238000010798 ubiquitination Methods 0.000 description 1
- 210000003932 urinary bladder Anatomy 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 210000004291 uterus Anatomy 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 208000012137 von Willebrand disease (hereditary or acquired) Diseases 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- the variant form may confer an evolutionary advantage or disadvantage relative to a progenitor form or may be neutral.
- a variant form confers a lethal disadvantage and is not transmitted to subsequent generations of the organism.
- a variant form confers an evolutionary advantage to the species and is eventually incorporated into the DNA of many or most members of the species and effectively becomes the progenitor form.
- a restriction fragment length polymorphism Is a variation in DNA sequence that alters the length of a restriction fragment (Botstein et al . , Am. J. Hum . Genet . 32, 314-331 (1980)).
- the restriction fragment length polymorphism may create or delete a restriction site, thus changing the length of the restriction fragment.
- RFLPs have been widely used in human and animal genetic analyses (see WO 90/13668; W090/11369; Donis-Keller, Cell 51, 319-337 (1987); Lander et al . , Genetics 121, 85-99 (1989) ) .
- the presence of the RFLP in an individual can be used to predict the likelihood that the animal will also exhibit the trait.
- VNTR variable number tandem repeat
- STRs short tandem repeats
- VNTRs have been used in identity "and paternity analysis (US 5,075,217; Armour et al . , FEBS Lett . 307, 113-115 (1992); Horn et al . , W0 91/14003; Jeffreys, EP 370,719), and in a large number of genetic mapping studies.
- Other polymorphisms take the form of single nucleotide variations between individuals of the same species .
- polymorphisms are far more frequent than RFLPs , STRs and VNTRs .
- Some single nucleotide polymorphisms occur in protein-coding sequences, in which case, one of the polymorphic forms may give rise to the expression of a defective or other variant protein and, potentially, a genetic disease. Examples of genes, in which polymorphisms within coding sequences give rise to genetic disease include -globin (sickle cell anemia) and CFTR (cystic fibrosis) .
- Other single nucleotide polymorphisms occur in noncoding regions. Some of these polymorphisms may also result in defective protein expression (e.g., as a result of defective splicing) . Other single nucleotide polymorphisms have no phenotypic effects.
- Single nucleotide polymorphisms can be used in the same manner as RFLPs and VNTRs, but offer several advantages. Single nucleotide polymorphisms occur with greater frequency and are spaced more uniformly throughout the genome than other forms of polymorphism. The greater frequency and uniformity of single nucleotide polymorphisms means that there is a greater probability that such a polymorphism will be found in close proximity to a genetic locus of interest than would be the case for other polymorphisms. The different forms of characterized single nucleotide polymorphisms are often easier to distinguish than other types of polymorphism (e.g., by use of assays employing allele-specific hybridization probes or primers) .
- the invention provides nucleic acid sequences comprising nucleic acid segments of from about 10 to about 200 bases as shown in the Table, column 7, including a polymorphic site. Complements of these segments are also included.
- the segments can be DNA or RNA, and can be double- or single-stranded. Segments can be, for example, 10-20, 10-50 or 10-100 bases long. Preferred segments include a biallelic polymorphic site. The base occupying the polymorphic site in the segments can be the reference
- the invention further provides allele-specific- oligonucleotides that hybridize to a segment of a fragment shown in the Table, column 7, or its complement. These oligonucleotides can be probes or primers. Also provided are isolated nucleic acids comprising a sequence shown in the Table, column 7, or the complement thereto, in which the polymorphic site within the sequence is occupied by a base other than the reference base shown in the Table, column 3.
- the invention further provides a method of analyzing a nucleic acid from an individual.
- the method determines which base is present at any one of the polymorphic sites shown in the Table.
- a set of bases occupying a set of the polymorphic sites shown in the Table is determined. This type of analysis can be performed on a number of individuals, who are tested for the presence of a disease phenotype. The presence or absence of disease phenotype is then correlated with a base or set of bases present at the polymorphic sites in the individuals tested.
- An oligonucleotide can be DNA or RNA, and single- or double- stranded. Oligonucleotides can be naturally occurring or synthetic, but are typically prepared by synthetic means.
- the oligonucleotides of the present invention can comprise all of an oligonucleotide sequence presented in column 7 of the Table or a segment of such an oligonucleotide which includes a polymorphic site.
- Oligonucleotides can be all of a nucleic acid segment as represented in column 7 of the Table; a nucleic acid sequence which comprises a nucleic acid segment represented in column 7 of the Table and additional nucleic acids (present at either or both ends of a nucleic acid segment of column 7) ; or a portion (fragment) of a nucleic acid segment represented in column 7 of the Table which includes a polymorphic site.
- Preferred oligonucleotides of the invention include segments of DNA, or their complements, which include any one of the polymorphic sites shown in the Table. The segments can be between 5 and 250 bases, and, in specific embodiments, are between 5-10, 5-20, 10-20, 10- 50, 20-50 or 10-100 bases.
- the polymorphic site can occur within any position of the segment.
- the segments can be from any of the allelic forms of DNA shown in the Table.
- Hybridization probes are oligonucleotides which bind in a base-specific manner to a complementary strand of nucleic acid. Such probes include peptide nucleic acids, as described in Nielsen et al . , Science 254, 1497-1500 (1991) .
- primer refers to a single- stranded oligonucleotide which acts as a point of initiation of template-directed DNA synthesis under appropriate conditions ( e . g.
- primer site refers to the area of the target DNA to which a primer hybridizes.
- primer pair refers to a set of primers including a 5' (upstream) -primer that hybridizes with the 5' end of the DNA sequence to be amplified and a 3' (downstream) primer that hybridizes with the complement of the 3' end of the sequence to be amplified.
- linkage describes the tendency of genes, alleles, loci or genetic markers to be inherited together as a result of their location on the same chromosome. It can be measured by percent recombination •between the two genes, alleles, loci or genetic markers.
- polymorphism refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population.
- a polymorphic marker or site is the locus at which divergence occurs. Preferred markers have at least two alleles, each occurring at frequency of greater than 1%, and more preferably greater than 10% or 20% of a selected population.
- a polymorphic locus may be as small as one base pair.
- Polymorphic markers include restriction fragment length polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions, minisatellites, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, simple sequence repeats, and insertion elements such as Alu.
- allelic form is arbitrarily designated as the reference form and other allelic forms are designated as alternative or variant alleles.
- allelic form occurring most frequently in a selected population is sometimes referred to as the wildtype form. Diploid organisms may be homozygous or heterozygous for allelic forms.
- a diallelic or biallelic polymorphism has two forms.
- a triallelic polymorphism has three forms.
- a single nucleotide polymorphism occurs at a polymorphic site occupied by a single nucleotide, which is the site of variation between allelic sequences . -The site is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 or 1/1000 members of the populations) .
- a single nucleotide polymorphism usually arises due to substitution of one nucleotide for another at the polymorphic site.
- a transition is the replacement of one purine by another purine or one pyrimidine by another pyrimidine.
- a transversion is the replacement of a purine by a pyrimidine or vice versa.
- Single nucleotide polymorphisms can also arise from a deletion of a nucleotide or an insertion of a nucleotide relative to a reference allele.
- the polymorphic site is occupied by a base other than the reference base. For example, where the reference allele contains the base "T" at the polymorphic site, the altered allele can contain a "C", "G” or "A" at the polymorphic site.
- Hybridizations are usually performed under stringent conditions, for example, at a salt concentration of no more than 1 M and a temperature of at least 25 °C.
- stringent conditions for example, at a salt concentration of no more than 1 M and a temperature of at least 25 °C.
- 5X SSPE 750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4
- a temperature of 25-30°C, or equivalent conditions are suitable for allele-specific probe hybridizations.
- Equivalent conditions can be determined by varying one or more of the parameters given as an example, as known in the art, while maintaining a similar degree of identity or similarity between the target nucleotide sequence and the primer or probe used.
- an isolated nucleic acid of the invention may be substantially isolated with respect to the complex cellular milieu in which it naturally occurs.
- the- isolated material will form part of a composition (for example, a crude extract containing other substances) , buffer system or reagent mix.
- the material may be purified to essential homogeneity, for example as determined by PAGE or column chromatography such as HPLC.
- an isolated nucleic acid comprises at least about 50, 80 or 90 percent (on a molar basis) of all macromolecular species present.
- the novel polymorphisms of the invention are listed in the Table.
- the first column of the Table lists the names assigned to the fragments in which the polymorphisms occur.
- the fragments are all human genomic fragments.
- the sequence of one allelic form of each of the fragments (arbitrarily referred to as the prototypical or reference form) has been previously published. These sequences are listed at http://www-genome.wi.mit.edu/ (all STS's (sequence tag sites)); http://shgc.stanford.edu (Stanford STS's); and http://ww.tigr.org/ (TIGR STS's).
- the Web sites also list primers for amplification of the fragments, and the genomic location of fragments. Some fragments are expressed sequence tags, and some are random genomic fragments. All information in the websites concerning the fragments listed in the Table is incorporated by reference in its entirety for all purposes.
- the second column lists the position in the fragment in which a polymorphic site has been found. Positions are numbered consecutively with the first base of the fragment sequence as listed in one of the above databases being assigned the number one.
- the third column lists the base occupying the polymorphic site in the sequence in the data base. This base is arbitrarily designated -the-- reierence or prototypical form, but it is not necessarily the most frequently occurring form.
- the fourth column in the Table lists the alternative base(s) at the polymorphic site.
- the fifth column of the Table lists a 5' (upstream or forward) primer that hybridizes with the 5' end of the DNA sequence to be amplified.
- the sixth column of the Table lists a 3' (downstream or reverse) primer that hybridizes with the complement of the 3' end of the sequence to be amplified.
- the seventh column of the Table lists a number of bases of sequence on either side of the polymorphic site in each fragment .
- the indicated sequences can be either DNA or RNA. In the latter, the T's shown in the Table are replaced by U's.
- the base occupying the polymorphic site is indicated in EUPAC-IUB ambiguity code.
- tissue samples include whole blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair.
- tissue sample For assay of cDNA or mRNA, the tissue sample must be obtained from an organ in which the target nucleic acid is expressed. For example, if the target nucleic acid is a cytochrome P450, the liver is a suitable source.
- PCR DNA Amplifica tion
- PCR Protocols A Guide to Methods and Applications (eds. Innis,-- et-al . , Academic Press, San Diego, CA, 1990); Mattila et al . , Nuclei c Acids Res . 19, 4967 (1991); Eckert et al . , PCR Methods and Applications 1, 17 (1991); PCR (eds. McPherson et al . , IRL Press, Oxford); and U.S. Patent 4,683,202.
- LCR ligase chain reaction
- NASBA nucleic acid based sequence amplification
- the latter two amplification methods involve isothermal reactions based on isothermal transcription, which produce both single stranded RNA (ssRNA) and double stranded DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, respectively.
- ssRNA single stranded RNA
- dsDNA double stranded DNA
- the first type of analysis is carried out to identify polymorphic sites not previously characterized (i.e., to identify new polymorphisms) .
- This analysis compares target sequences in different individuals to identify points of variation, i.e., polymorphic sites.
- groups of individuals representing the greatest ethnic diversity among humans and greatest breed and species variety in plants and animals patterns characteristic of the most common alleles/haplotypes of the locus can be identified, and the frequencies of such alleles/haplotypes in the population can be determined. Additional allelic frequencies can be determined -for subpopulations characterized by criteria such as geography, race, or gender.
- the de novo identification of polymorphisms of the invention is described in the Examples section.
- the second type of analysis determines which form(s) of a characterized (known) polymorphism are present in individuals under test. There are a variety of suitable procedures, which are discussed in turn.
- Allele-specific probes for analyzing polymorphisms is described by e.g., Saiki et al . , Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment of target DNA from one individual but do not hybridize to the corresponding segment from another individual due to the presence of different polymorphic forms in the respective segments from the two individuals. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles.
- Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15-mer at the 7 position; in a 16-mer, at either the 8 or 9 position) of the probe. This design of probe achieves good discrimination in hybridization between different allelic forms.
- Allele-specific probes are often used in pairs, one member of a pair showing a perfect match to a reference form of a target sequence and the other member showing a perfect match to a variant form. Several pairs of probes can then be immobilized on the same support for simultaneous analysis of multiple polymorphisms within the same target sequence .
- the polymorphisms can also be identified by hybridization to nucleic acid arrays, some examples of which are described in WO 95/11995.
- One form of such arrays is described in the Examples section in connection with de novo identification of polymorphisms.
- the same array or a different array can be used for analysis of characterized polymorphisms.
- WO 95/11995 also describes subarrays that are optimized for detection of a variant form of a precharacterized polymorphism.
- Such a subarray contains probes designed to be complementary to a second reference sequence, which is an allelic variant of the first reference sequence.
- the second group of probes is designed by the same principles as described in the Examples, except that the probes exhibit complementarity to the second reference sequence.
- a second group can be particularly useful for analyzing short subsequences of the primary reference sequence in which multiple mutations are expected to occur within a short distance commensurate with the length of the probes (e.g., two or more mutations within 9 to 21 bases) .
- Allele-Specific Primers An allele-specific primer hybridizes to a site on target DNA overlapping a polymorphism and only primes amplification of an allelic form to which the primer exhibits perfect complementarity. See Gibbs , Nucl eic Acid Res . 17, 2427-2448 (1989) . This primer is used in conjunction with a second primer which hybridizes at a distal site. Amplification proceeds from the -two-primers , resulting in a detectable product which indicates the particular allelic form is present. A control is usually performed with a second pair of primers, one of which shows a single base mismatch at the polymorphic site and the other of which exhibits perfect complementarity to a distal site.
- the single-base mismatch prevents amplification and no detectable product is formed.
- the method works best when the mismatch is included in the 3 ' -most position of the oligonucleotide aligned with the polymorphism because this position is most destabilizing to elongation from the primer (see, e.g., WO 93/22456).
- the direct analysis of the sequence of polymorphisms of the present invention can be accomplished using either the dideoxy chain termination method or the Maxam Gilbert method (see Sambrook et al . , Molecular Cloning, A Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind et al . i Recombinant DNA Laboratory Manual , (Acad. Press, 1988) ) . 5. Denaturing Gradient Gel Electrophoresis Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution. Erlich, ed. , PCR Technology, Principles and Applica tions for DNA Amplification, (W.H. Freeman and Co, New York, 1992), Chapter 7.
- Alleles of target sequences can be differentia-ted using single- strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, as described in Orita et al . , Proc . Na t . Acad . Sci . 86,
- Amplified PCR products can be generated as described above, and heated or otherwise denatured, to form single stranded amplification products.
- Single- stranded nucleic acids may refold or form secondary structures which are partially dependent on the base sequence.
- the different electrophoretic mobilities of single-stranded amplification products can be related to base-sequence differences between alleles of target sequences .
- polymorphisms of the invention are often used in conjunction with ⁇ - polymorphisms in distal genes.
- Preferred polymorphisms for use in forensics are biallelic because the population frequencies of two polymorphic forms can usually be determined with greater accuracy than those of multiple polymorphic forms at multi-allelic loci.
- the capacity to identify a distinguishing or unique set of forensic markers in an individual is useful for forensic analysis. For example, one can determine whether a blood sample from a suspect matches a blood or other tissue sample from a crime scene by determining whether the set of polymorphic forms occupying selected polymorphic sites is the same in the suspect and the sample. If the set of polymorphic markers does not match between a suspect and a sample, it can be concluded (barring experimental error) that the suspect was not the source of the sample. If the set of markers does match, one can conclude that the DNA from the suspect is consistent with that found at the crime scene.
- frequencies of the polymorphic forms at the loci tested have been determined (e.g., by analysis of a suitable population of individuals) , one can perform a statistical analysis to determine the probability that a match of suspect and crime scene sample would occur by chance .
- p(ID) is the probability that two random individuals have the same polymorphic or allelic form at a given polymorphic site. In biallelic loci, four genotypes are possible: AA, AB, BA, and BB . If alleles A and B occur in a haploid genome of the organism with frequencies x and y, the probability of each genotype in a diploid organism is
- the cumulative probability of identity (cum p(ID)) for each of multiple unlinked loci is determined by multiplying the probabilities provided by each locus.
- cum p(ID) p(IDl)p(ID2)p(ID3) ....
- the object of paternity testing is usually" to ⁇ determine whether a male is the father of a child. In most cases, the mother of the child is known and thus, the mother's contribution to the child's genotype can be traced. Paternity testing investigates whether the part of the child's genotype not attributable to the mother is consistent with that of the putative father. Paternity testing can be performed by analyzing sets of polymorphisms in the putative father and the child. If the set of polymorphisms in the child attributable to the father does not match the set of polymorphisms of the putative father, it can be concluded, barring experimental error, that the putative father is not the real father. If the set of polymorphisms in the child attributable to the father does match the set of polymorphisms of the putative father, a statistical calculation can be performed to determine the probability of coincidental match.
- polymorphisms of the invention may contribute to the phenotype of an organism in different ways . Some polymorphisms occur within a protein coding sequence and contribute to phenotype by affecting protein structure.
- the effect may be neutral, beneficial or detrimental, or both beneficial and detrimental, depending on the circumstances .
- a heterozygous sickle cell mutation confers resistance to malaria, but a homozygous sickle cell mutation is usually lethal.
- Other polymorphisms occur in noncoding regions but may exert phenotypic effects indirectly via influence on replication, transcription, and translation.
- a single polymorphism may affect more than one phenotypic trait.
- a single phenotypic trait may be affected by polymorphisms in different genes. Further, some polymorphisms predispose an individual to a distinct mutation that is causally related to a certain phenotype.
- Phenotypic traits include diseases that ha-ve teiown but hitherto unmapped genetic components (e.g., agammaglobulimenia, diabetes insipidus, Lesch-Nyhan syndrome, muscular dystrophy, Wiskott-Aldrich syndrome,
- Phenotypic traits also include symptoms of, or susceptibility to, multifactorial diseases of which a component is or may be genetic, such as autoimmune diseases, inflammation, cancer, diseases of the nervous system, and infection by pathogenic microorganisms.
- autoimmune diseases include rheumatoid arthritis, multiple sclerosis, diabetes (insulin-dependent and non-independent) , systemic lupus erythematosus and Graves disease.
- cancers include cancers of the bladder, brain, breast, colon, esophagus, kidney, leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach and uterus.
- Phenotypic traits also include characteristics such as longevity, appearance (e.g., baldness, obesity), strength, speed, endurance, fertility, and susceptibility or receptivity to particular drugs or therapeutic treatments.
- Correlation is performed for a population of individuals who have been tested for the presence or absence of a phenotypic trait of interest and for polymorphic markers sets.
- a set of polymorphisms i.e. a polymorphic set
- the alleles of each polymorphism of the set are then reviewed--to-determine whether the presence or absence of a particular allele is associated with the trait of interest.
- Correlation can be performed by standard statistical methods such as a K - squared test and statistically significant correlations between polymorphic form(s) and phenotypic characteristics are noted.
- allele Al at polymorphism A correlates with heart disease.
- allele Bl at polymorphism B correlates with increased milk production of a farm animal.
- Such correlations can be exploited in several ways .
- detection of the polymorphic form set in a human or animal patient may justify immediate administration of treatment, or at least the institution of regular monitoring of the patient.
- Detection of a polymorphic form correlated with serious disease in a couple contemplating a family may also be valuable to the couple in their reproductive decisions.
- the female partner might elect to undergo in vitro fertilization to avoid the possibility of transmitting such a polymorphism from her husband to her offspring.
- immediate therapeutic intervention or monitoring may not be justified.
- the patient can be motivated to begin simple life-style changes (e.g., diet, exercise) that can be accomplished at little cost to the patient but confer potential benefits in reducing the risk of conditions to which the patient may have increased susceptibility by virtue of variant alleles .
- Identification -of -a polymorphic set in a patient correlated with enhanced receptiveness to one of several treatment regimes for a disease indicates that this treatment regime should be followed.
- Y ijkpn ⁇ + YSi + P j + X k + ⁇ 1 + ... jS 17 + PE n + a n +e p
- Y ijknp is the milk, fat, fat percentage, SNF, SNF percentage, energy concentration, or lactation energy record
- ⁇ is an overall mean
- YSi is the effect common to all cows calving in year-season
- X k is the effect common to cows in either the high or average selection line
- ⁇ to ⁇ xl are the binomial regressions of production record on mtDNA D-loop sequence polymorphisms
- PE n is permanent environmental effect common to all records of cow n
- a n is effect of animal n and is composed of the additive genetic contribution of sire and dam breeding values and a Mendelian sampling effect
- e p is a random residual. It was found that eleven of seventeen polymorphisms tested influenced at least one production trait. Bovines having the best
- D. Genetic Mapping of Phenotypic Traits The previous section concerns identifying correlations between phenotypic traits and polymorphisms that directly or indirectly contribute to those traits.
- the present section describes identification of a physical linkage between a genetic locus associated with a trait of interest and polymorphic markers that are not associated with the trait, but are in physical proximity with the genetic locus responsible for the trait and co-segregate with it.
- Such analysis is useful for mapping a genetic locus associated with a phenotypic trait to a chromosomal position, and thereby cloning gene(s) responsible for the trait. See Lander et al . , Proc . Na tl . Acad . Sci .
- Linkage studies are typically performed on members of a family. Available members of the family are characterized for the presence or absence of a phenotypic trait and for a set of polymorphic markers. The distribution of polymorphic markers in an informative meiosis is then analyzed to determine which polymorphic markers co- segregate with a phenotypic trait. See, e . g. , Kerem et al . , Science 245, 1073-1080 (1989); Monaco et al . , Na ture 316, 842 (1985); Yamoka et al . , Neurology 40, 222-226 (1990); Rossiter et al . , FASEB Journal 5, 21-27 (1991).
- LOD log of the odds
- the likelihood at a given value of ⁇ is: probability of data if loci linked at ⁇ to probability of data if loci unlinked.
- the computed likelihoods are usually expressed as the log 10 of this ratio (i.e., a lod score) .
- a lod score of 3 indicates 1000:1 odds against an apparent observed linkage being a coincidence.
- the use of logarithms- allows data collected from different families to be combined by simple addition. Computer programs are available for the calculation of lod scores for differing values of ⁇ (e.g., LIPED, MLINK (Lathrop, Proc . Na t . Acad . Sci . (USA) 81, 3443-3446 (1984)) .
- a recombination fraction may be determined from mathematical tables. See Smith et al . , Ma thema tical tables for research workers in human genetics (Churchill, London, 1961); Smith, Ann . Hum . Genet . 32, 127-150 (1968) . The value of ⁇ at which the lod score is the highest is considered to be the best estimate of the recombination fraction.
- Positive lod score values suggest that the two loci are linked, whereas negative values suggest that linkage is less likely (at that value of ⁇ ) than the possibility that the two loci are unlinked.
- a combined lod score of +3 or greater is considered definitive evidence that two loci are linked.
- a negative lod score of -2 or less is taken as definitive evidence against linkage of the two loci being compared.
- Negative linkage data are useful in excluding a chromosome or a segment thereof from consideration. The search focuses on the remaining non-excluded chromosomal locations .
- the invention further provides variant forms of nucleic acids and corresponding proteins.
- the nucleic acids comprise one of the sequences described in the Table, column 8, in which the polymorphic position is occupied by one of the alternative bases for that position. Some nucleic acids encode full-length variant forms of proteins.
- variant proteins have the prototypical amino acid sequences encoded by nucleic acid sequences shown in the Table, column 8, (read so as to be in- frame with the full-length coding sequence of which it is a component) except at an amino acid encoded by a codon including one of the polymorphic positions shown in the Table. That position is occupied by the amino acid coded by the corresponding codon in any of the alternative forms shown in the Table .
- Variant genes can be expressed in an expression vector in which a variant gene is operably linked to a native or other promoter.
- the promoter is a eukaryotic promoter for expression in a mammalian cell.
- the transcription regulation sequences typically include a heterologous promoter and optionally an enhancer which is recognized by the host.
- the selection of an appropriate promoter for example trp, lac, phage promoters, glycolytic enzyme promoters and tRNA promoters, depends on the host selected.
- Commercially available expression vectors can be used. Vectors can include host-recognized replication systems, amplifiable genes, selectable markers, host sequences useful for insertion into the host genome, and the like.
- the means of introducing the expression construct into a host cell varies depending upon the particular construction and the target host. Suitable means include fusion, conjugation, transfection, transduction, electroporation or injection, as described in Sambrook, supra .
- a wide variety of host cells can be employed for expression of the variant gene, both prokaryotic and eukaryotic. Suitable host cells include bacteria such as E. coli , yeast, filamentous fungi, insect cells, mammalian cells, typically immortalized, e . g. , mouse, CHO, human and monkey cell lines and derivatives thereof. Preferred host cells are able to process the variant gene product to produce an appropriate mature polypeptide.
- the protein may be isolated by conventional means of protein biochemistry and purification to obtain a substantially pure product, i . e . , 80, 95 or 99% free of cell component contaminants, as described in Jacoby, Methods in Enzymology Volume 104, Academic Press, New York (1984); Scopes, Protein Purifica tion, Principles and Practice, 2nd Edition, Springer-Verlag, New York (1987); and DeuLscher (ed) , Guide to Protein Purifica tion, Methods in Enzymology, Vol. 182 (1990). If the protein is secreted, it can be isolated from the supernatant in which the host cell is grown. If not secreted, the protein can be isolated from a lysate of the host cells.
- the invention further provides transgenic nonhuman animals capable of expressing an exogenous variant gene and/or having one or both alleles of an endogenous variant gene inactivated.
- Expression of an exogenous variant gene is usually achieved by operably linking the gene to a promoter and optionally an enhancer, and microinjecting the construct into a zygote .
- Inactivation of endogenous variant genes can be achieved by forming a transgene in which a cloned variant gene is inactivated by insertion of a positive selection marker. See Capecchi, Science 244, 1288-1292 (1989) .
- the transgene is then introduced into an embryonic stem cell, where it undergoes homologous recombination with an endogenous variant gene. Mice and other rodents are preferred animals. Such animals provide useful drug screening systems .
- the present invention includes biologically active fragments of the polypeptides, or analogs thereof, including organic molecules which simulate the interactions of the peptides.
- Biologically active fragments include any portion of the full-length polypeptide which confers a biological function on the variant gene product, including ligand binding, and antibody binding.
- Ligand binding includes binding by nucleic acids, proteins or polypeptides, small biologically active molecules, or large cellular structures.
- Antibodies that specifically bind to variant gene products but not to corresponding prototypical gene products are also provided.
- Antibodies can be made by injecting mice or other animals with the variant gene product or synthetic -peptide- fragments thereof. Monoclonal antibodies are screened as are described, for example, in Harlow & Lane, Antibodies , A Labora tory Manual , Cold Spring Harbor Press, New York (1988) ; Goding, Monoclonal antibodies, Principles and Practice (2d ed.) Academic Press, New York (1986) . Monoclonal antibodies are tested for specific immunoreactivity with a variant gene product and lack of immunoreactivity to the corresponding prototypical gene product . These antibodies are useful in diagnostic assays for detection of the variant form, or as an active ingredient in a pharmaceutical composition.
- kits comprising at least one allele-specific oligonucleotide as described above. Often, the kits contain one or more pairs of allele- specific oligonucleotides hybridizing to different forms of a polymorphism. In some kits, the allele-specific oligonucleotides are provided immobilized to a substrate.
- the same substrate can comprise allele- specific oligonucleotide probes for detecting at least 10, 100 or all of the polymorphisms shown in the Table.
- kits include, for example, restriction enzymes, reverse-transcriptase or polymerase, the substrate nucleoside triphosphates , means used to label (for example, an avidin-enzyme conjugate and enzyme substrate and chromogen if the label is biotin) , and the appropriate buffers for reverse transcription, PCR, or hybridization reactions.
- the kit also contains instructions for carrying out the methods.
- the polymorphisms shown in the Table were identified by resequencing of target sequences from three to ten unrelated individuals of diverse ethnic and geographic backgrounds by hybridization to probes immobilized to microfabricated arrays or conventional sequencing.
- the strategy and principles for design and use of such arrays are generally described in WO 95/11995.
- the strategy provides arrays of probes for analysis of target sequences showing a high degree of sequence identity to the reference sequences of the fragments shown in the Table, column 1.
- the reference sequences were sequence-tagged sites (STSs) developed in the course of the Human Genome Project (see, e . g . , Science 270, 1945-1954 (1995); Nature 380, 152-154 (1996)).
- a typical probe array used in this analysis has two groups of four sets of probes that respectively tile both strands of a reference sequence.
- a first probe set comprises a plurality of probes exhibiting perfect complementarily with one of the reference sequences.
- Each probe in the first probe set has an interrogation position that corresponds to a nucleotide in the reference sequence. That is, the interrogation position is aligned with the corresponding nucleotide in the reference sequence, when the probe and reference sequence are aligned to maximize complementarily between the two.
- For each probe in the first set there are three corresponding probes from three additional probe sets. Thus, there are four probes corresponding to each nucleotide in the reference sequence.
- probes from the three additional probe -sets aaee identical to the corresponding probe from the first probe set except at the interrogation position, which occurs in the same position in each of the four corresponding probes from the four probe sets, and is occupied by a different nucleotide in the four probe sets.
- probes were 25 nucleotides long. Arrays tiled for multiple different references sequences were included on the same substrate.
- target sequences from an individual were amplified from human genomic DNA using primers for the fragments indicated in the listed Web sites.
- the amplified target sequences were fluorescently labelled during or after PCR.
- the labelled target sequences were hybridized with a substrate bearing immobilized arrays of probes. The amount of lable bound to probes was measured. Analysis of the pattern of label revealed the nature and position of differences between the target and reference sequence. For example, comparison of the intensities of four corresponding probes reveals the identity of a corresponding nucleotide in the target sequences aligned with the interrogation position of the probes.
- the corresponding nucleotide is the complement of the nucleotide occupying the interrogation position of the probe showing the highest intensity (see WO 95/11995) .
- the existence of a polymorphism is also manifested by differences in normalized hybridization intensities of probes flanking the polymorphism when the probes hybridized to corresponding targets from different individuals. For example, relative loss of hybridization intensity in a "footprint" of probes flanking a polymorphism signals a difference between the target and reference (i.e., a polymorphism) (see EP 717,113) .
- hybridization intensities for corresponding targete-s from different individuals can be classified into groups or clusters suggested by the data, not defined a priori , such that isolates in a give cluster tend to be similar and isolates in different clusters tend to be dissimilar. Hybridizations to samples from different individuals were performed separately. The Table summarizes the data obtained for target sequences in comparison with a reference sequence for the individuals tested.
- the invention includes a number of general uses that can be expressed concisely as follows.
- the invention provides for the use of any of the nucleic acid segments described above in the diagnosis or monitoring of diseases, such as cancer, inflammation, heart disease, diseases of the CNS, and susceptibility to infection by microorganisms.
- the invention further provides for the use of any of the nucleic acid segments in the manufacture of a medicament for the treatment or prophylaxis of such diseases.
- the invention further provides for the use of any of the DNA segments as a pharmaceutical.
- Wl-7718b 248 AGGAACAAAAAATTACAAAGAACCATGCAGGAAGGAAAACTATGTATT[A/G1AT
- ATrGCACTG GTTTTTGAAATACCTTTGTAGTTACTCAAGC[A/C, ⁇ GTTACTCCCTACACTGATGC AAGGATTACAGAAACTGATGCCAAGGGGCTGAGTGAGTTCAACTACATGTTCTGGGGGCCCGGAGAT AGATGACTTTGCAGATGGAMGAGGTGAAAATGAAGAAGGAAGCTGTGTTGAAACAGAAAAATAAG
- Wl-7718a 42 TCAAAAGGAACAAAAATTACAAAGAACCATGCAGGAAGGAAAACTATGTATTA
- WI-7227C 291 TTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATGCAAT
- Wl-7227b 93 GTGTTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATG
- Wl-7227a 24 G GTGTTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATG j CCACAATGCCTCTCCCACGATGTCAAGGACTCCTGTCTGTCCTGGAGGTGGGAGACAAGGAACCTCCG
- Wl-1 95b 1 30 AGTGAGCTGGGGAAGGCAGGATTT
- Wl-1126b 230 AAAATGCAAATCCAGCTGT CTTTTT[T/C
- Wl-3429b 64 TCCTGACTGTTAACAAGCACTCCAGGCAATTCTTAAGACCAAGCACGGAGC
- Wl-3429a 62 TCCTGACTGTTAACAAGCACTCCAGGCAATTCTTAAGACCAAGCACGGAGC
- Wl-6786b 1 1 1 TTTTTGGCAGGGGACACTCCTTCTGGGTGCTCTATTGCTCAGTTTCATCATT
- Wl-6786a 1 06 TTTTTGGCAGGGGACACTCCTTCTGGGTGCTCTATTGCTCAGTTTCATCATT
- AAAAGGACAG TTTCCATCTTA CCAGATATCA TTTCATTTCTG CAACATTTATCAAACATGGTAGGGAAMGTTCTCACTCTGCACTATAAAAAGGACAGCCAGATATCA
- CAGAAMTCA ATGAGACCCTGCTTTGMCGTTAMCGTTTTGGMTMTGGAAMGGAGCTAGGACMTTCTTGCTT
- AAAAATTAAC CAGGGTCTTGCTCTGTCTCCCAGGCTAGAGTGAGGTGACACMTCMGACTCACAGTAGCCTCMCCT
- WI-7079 293 TTTTACAGCTCTTGGCAT ⁇ TCCTCGCCTAGGCCTGTGAGGTMCTGGGAT
- Wl-7104b 249 GTGAGGCCTTGCACCAGGTGGGGGCCACAGCACCAGCAGCATCTTTG[CtFJF
- WI-9161 61 1 CCTGGC GGM CTGTCTAGTCTCTCCTGTMGCCAMGMATGMCATTCCA
- Wl-7023b 206 A[C/A]ACACACATTCTTGCTCTACCCAMGCTCTGGCTGGCAGCACTM
- WI-7093 54 GGGAGAGCTCTTGTTATFATTMTATTGTTGCCGCTGTTGTGTTGTTGTTA
- ACTTCTCCC TCTGACCTAGG MAGMCTACAGAGGACGATGTCCAAMCMAAMTGGCATCACCTGTCAAAMTGGAGTTCCACT
- WI-205C 1 46 ATCTTACTTTGTTTAAMMCTGCATATGCCTTTA I I I I I GTTTTAGTTCCC
- Wl-205b 1 46 ATCTTACTTTGTTTAAAAMCTGCATATGCCTTTATTTTTGTTTTAGTTCCC
- WI-1943C 1 65 TACAGGGCACCGNTGAGCATTCCAGATGACTCCAMGCCCCGGCTGGAGTAT
- Wl-1943b 1 65 TACAGGGCACCGNTGAGCATTCCAGATGACTCCAMGCCCCGGCTGGAGTAT
- Wi-6336b 234 GTACCCCAGTGCATTATGTCTTGGTAGAGCC[C/T]TGAGGACACTGACAGT
- Wl-6564b 54 GTTCCTTGGCAGGAGMCATGCATATGACTTTAAMTMAGACCMCA
- Wl-6817 1 45 MGATGTTGGACACCTTGTGTTCAMTCTTGGTTCAGGTGCGGCCTGTGCAG
- Wl-6826b 1 54 TMGCTGMTTGCAMTTATGGCMCACACACTGGACTGGGGTATACGTTG
- WI-6826 1 54 TMGCTGMTTGCAMTTATGGCMCACACACTGGACTGGGGTATACGTTG
- Wl-7056b 1 8
- WI-7136 58 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTAGCTTTCTATATATG
- WI-7146C 21 0 MCGC[A/G]GTTCATGTACMGGCCCCTCTGCMCTGGAGAGAAMTTA
- WI-7146 202 ICCMCGCAGTTCATGTACMGGCCCCTCTGCMCTGGAGAGAAMTTA
- WI-7153 1 61 AGTACCTATCTTTAMGTATAGTACATTTTACATATGTAAATGGTATGTTT
- Wl-7169b 1 61 TTTCMGTCATCTTAGCAGCTAGGATTCTCAMTGGMGTGTTATATATA
- Wl-7464b 1 68 GAMGMAGCCCTACAMTAGGCCCAGGAGMGCMCGTTCACCMCMTTAT
- Wl-7464a 1 03 GAMGMAGCCCTACAMTAGGCCCAGGAGMGCAACGTTCACCMCMTTAT
- Wl-7506b 1 1 8 GMGMMTATTTTAAMTATTGGACCACTCTTGTTCTACCATCCCTACCCACT
- Wl-7534b 1 43 AGAGTGCTGCTAAM ⁇ GGATTGGTGTGATCTTTTTGGTAGTTGTMTTT
- WI-7534 1 35 AGAGTGCTGCTAAMTTGGATTGGTGTGATCTTTTTGGTAGTTGTMTTT
- Wl-7543b 1 62 CTCTGCAGCCCTCAGATFATTTTTCCTCTGGCTCCTTGGATGTAGTCAGTTA
- Wl-7577g 1 57 ATTGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
- WI-7743C 1 06 GAGGGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCC
- Wl-7743 1 06 GAGGGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCC
- Wl-7765b 1 26 ACTCAMCCAMTCACTGMCTTTGCTGAGCCTGTAMATAAMGGTCGGA
- Wl-7774b 1 70 ATGATTGAAMTMTGCTGTCCTTTAGTAGCMGTAAMTGTGTCTTGCT
- Wl-7785b 1 65 TAATTIATTTTGTCCATTGATGTATTTATTTTGTAMTGTATCTTGGTGCTGC
- WI-7789C 84 GCCCTCCTGGTGACTCGGGGGCTGTCTCAGACGACTAGCCCAGGACCCATCT _
- Wl-7830d 1 50 T AGGTTGATCGTTGTGTTGTTRTGCTGCACTTTTTACI I I I I IGCGTGTGGA
- WI-7830C 54 AGGTTGATCGTTGTGTTGTTTFGCTGCACTTTTTAC I I I I I GCGTGTGGA
- Wl-7830b 1 34 AGGTTGATCGTTGTGTTGTTFTGCTGCACTTTTTAC I I I I I GCGTGTGGA
- Wl-7900d 1 28 TATGATGTATTTCTGAGCTAAMCTCMCTATAGMGACATTAAAAGAAATC
- WI-7900C 84 TATGATGTATTTCTGAGCTAAMCTCMCTATAGMGACATTAMAGAAATC
- WI-7900 84 TATGATGTATTTCTGAGCTAAAACTCAACTATAGAAGACATTAAAAGAAATC
- WI-8024C 206 TTCCC[A/G]CTCTAGMCAGCTGGCCCTGGTCGTCAGTACACMGGAMGAGC
- Wl-8024b 206 TTCCC[A/G]CTCTAGMCAGCTGGCCCTGGTCGTCAGTACACMGGMAGAGC
- WI-8321 1 78 TTTTGCTATGGTTCTAGTTFATCMCCTACTTTATTAGCTGMCTGTTGGC
- WI-8321 1 78 TTTFGCTATGGTTCTAGTTTATCMCCTACTTTATTAGCTGMCTGTTGGC
- Wl-8332b 123 AGGTGGAGGGTNTCCGGGGMGCAGTTAGATGAGTTMGTGTGATGCACA
- Wl-8378b 31 1 MCTGCCCCCATGATCCMTCACCTNTCACCAGGCCCCTCCTCCMCACGTGGGG
- WI-8378 308 MCTGCCCCCATGATCCMTCACCTNTCACCAGGCCCCTCCTCCMCACGTGGGG
- WI-8426 1 84 G AGGCTGGGAGTATGGANGGNCCCGGGGCCCTTGGCNATNGNATFCAGTGAG
- Wl-9676h 1 34 AGGCCAGGGTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
- Wl-9676d 1 34 AGGCCAGGGTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
- WI-9832 1 1 6 A TTTGTMGTGGACTAMGTTTGAGGACCAGACATGGMGGTTGGCTTTGGC
- AAAGCATGAC CGCTTATGTTA AATAAAATGA ATAGTMTTCC CMGTGAATATTGATACATGGCTGACMAGCATGACMTMMTGMCAC[A/G]TACGGGMTTAC
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides nucleic acid segments of the human genome including polymorphic sites. Allele-specific primers and probes hybridizing to regions flanking these sites are also provided. The nucleic acids, primers and probes are used in applications such as forensics, paternity testing, medicine and genetic analysis.
Description
BIALLELIC MARKERS
RELATED APPLICATIONS
This application claims priority to U.S. provisional application Serial No. 60/030,455, filed November 6, 1996, the entire teachings of which are incorporated herein by reference .
BACKGROUND OF THE INVENTION
The genomes of all organisms undergo spontaneous mutation in the course of their continuing evolution, generating variant forms of progenitor sequences (Gusella, Ann . Rev. Biochem . 55, 831-854 (1986)). The variant form may confer an evolutionary advantage or disadvantage relative to a progenitor form or may be neutral. In some instances, a variant form confers a lethal disadvantage and is not transmitted to subsequent generations of the organism. In other instances, a variant form confers an evolutionary advantage to the species and is eventually incorporated into the DNA of many or most members of the species and effectively becomes the progenitor form. In many instances, both progenitor and variant form(s) survive and co-exist in a species population. The coexistence of multiple forms of a sequence gives rise to polymorphisms.
Several different types of polymorphism have been reported. A restriction fragment length polymorphism (RFLP) Is a variation in DNA sequence that alters the length of a restriction fragment (Botstein et al . , Am. J. Hum . Genet . 32, 314-331 (1980)). The restriction fragment length polymorphism may create or delete a restriction site, thus changing the length of the restriction fragment.
RFLPs have been widely used in human and animal genetic analyses (see WO 90/13668; W090/11369; Donis-Keller, Cell 51, 319-337 (1987); Lander et al . , Genetics 121, 85-99 (1989) ) . When a heritable trait can be linked to a particular RFLP, the presence of the RFLP in an individual can be used to predict the likelihood that the animal will also exhibit the trait.
Other polymorphisms take the form of short tandem repeats (STRs) that include tandem di-, tri- and tetra- nucleotide repeated motifs. These tandem repeats are also referred to as variable number tandem repeat (VNTR) polymorphisms. VNTRs have been used in identity "and paternity analysis (US 5,075,217; Armour et al . , FEBS Lett . 307, 113-115 (1992); Horn et al . , W0 91/14003; Jeffreys, EP 370,719), and in a large number of genetic mapping studies. Other polymorphisms take the form of single nucleotide variations between individuals of the same species . Such polymorphisms are far more frequent than RFLPs , STRs and VNTRs . Some single nucleotide polymorphisms occur in protein-coding sequences, in which case, one of the polymorphic forms may give rise to the expression of a defective or other variant protein and, potentially, a genetic disease. Examples of genes, in which polymorphisms within coding sequences give rise to genetic disease include -globin (sickle cell anemia) and CFTR (cystic fibrosis) . Other single nucleotide polymorphisms occur in noncoding regions. Some of these polymorphisms may also result in defective protein expression (e.g., as a result of defective splicing) . Other single nucleotide polymorphisms have no phenotypic effects.
Single nucleotide polymorphisms can be used in the same manner as RFLPs and VNTRs, but offer several advantages. Single nucleotide polymorphisms occur with greater
frequency and are spaced more uniformly throughout the genome than other forms of polymorphism. The greater frequency and uniformity of single nucleotide polymorphisms means that there is a greater probability that such a polymorphism will be found in close proximity to a genetic locus of interest than would be the case for other polymorphisms. The different forms of characterized single nucleotide polymorphisms are often easier to distinguish than other types of polymorphism (e.g., by use of assays employing allele-specific hybridization probes or primers) . Only a small percentage of the total repository of polymorphisms in humans and other organisms ha-s been identified. The limited number of polymorphisms identified to date is due to the large amount of work required for their detection by conventional methods. For example, a conventional approach to identifying polymorphisms might be to sequence the same stretch of DNA in a population of individuals by dideoxy sequencing. In this type of approach, the amount of work increases in proportion to both the length of sequence and the number of individuals in a population and becomes impractical for large stretches of DNA or large numbers of persons .
SUMMARY OF THE INVENTION
The invention provides nucleic acid sequences comprising nucleic acid segments of from about 10 to about 200 bases as shown in the Table, column 7, including a polymorphic site. Complements of these segments are also included. The segments can be DNA or RNA, and can be double- or single-stranded. Segments can be, for example, 10-20, 10-50 or 10-100 bases long. Preferred segments include a biallelic polymorphic site. The base occupying the polymorphic site in the segments can be the reference
(Table, column 3) or an alternative base .(Table, column 4) .
The invention further provides allele-specific- oligonucleotides that hybridize to a segment of a fragment shown in the Table, column 7, or its complement. These oligonucleotides can be probes or primers. Also provided are isolated nucleic acids comprising a sequence shown in the Table, column 7, or the complement thereto, in which the polymorphic site within the sequence is occupied by a base other than the reference base shown in the Table, column 3.
The invention further provides a method of analyzing a nucleic acid from an individual. The method determines which base is present at any one of the polymorphic sites shown in the Table. Optionally, a set of bases occupying a set of the polymorphic sites shown in the Table is determined. This type of analysis can be performed on a number of individuals, who are tested for the presence of a disease phenotype. The presence or absence of disease phenotype is then correlated with a base or set of bases present at the polymorphic sites in the individuals tested.
DETAILED DESCRIPTION OF THE INVENTION DEFINITIONS
An oligonucleotide can be DNA or RNA, and single- or double- stranded. Oligonucleotides can be naturally occurring or synthetic, but are typically prepared by synthetic means. The oligonucleotides of the present invention can comprise all of an oligonucleotide sequence presented in column 7 of the Table or a segment of such an oligonucleotide which includes a polymorphic site. Oligonucleotides can be all of a nucleic acid segment as represented in column 7 of the Table; a nucleic acid sequence which comprises a nucleic acid segment represented in column 7 of the Table and additional nucleic acids (present at either or both ends of a nucleic acid segment of column 7) ; or a portion (fragment) of a nucleic acid segment represented in column 7 of the Table which includes a polymorphic site. Preferred oligonucleotides of the invention include segments of DNA, or their complements, which include any one of the polymorphic sites shown in the Table. The segments can be between 5 and 250 bases, and, in specific embodiments, are between 5-10, 5-20, 10-20, 10- 50, 20-50 or 10-100 bases. The polymorphic site can occur within any position of the segment. The segments can be from any of the allelic forms of DNA shown in the Table. Hybridization probes are oligonucleotides which bind in a base-specific manner to a complementary strand of nucleic acid. Such probes include peptide nucleic acids, as described in Nielsen et al . , Science 254, 1497-1500 (1991) . As used herein, the term primer refers to a single- stranded oligonucleotide which acts as a point of initiation of template-directed DNA synthesis under appropriate conditions ( e . g. , in the presence of four different nucleoside triphosphates and an agent for
polymerization, such as, DNA or RNA polymerase or reverse transcriptase) in an appropriate buffer and at a suitable temperature . The appropriate length of a primer depends on the intended use of the primer, but typically ranges from 15 to 30 nucleotides. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template . A primer need not reflect the exact sequence of the template, but must be sufficiently complementary to hybridize with a template. The term primer site refers to the area of the target DNA to which a primer hybridizes. The term primer pair refers to a set of primers including a 5' (upstream) -primer that hybridizes with the 5' end of the DNA sequence to be amplified and a 3' (downstream) primer that hybridizes with the complement of the 3' end of the sequence to be amplified.
As used herein, linkage describes the tendency of genes, alleles, loci or genetic markers to be inherited together as a result of their location on the same chromosome. It can be measured by percent recombination •between the two genes, alleles, loci or genetic markers.
As used herein, polymorphism refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population. A polymorphic marker or site is the locus at which divergence occurs. Preferred markers have at least two alleles, each occurring at frequency of greater than 1%, and more preferably greater than 10% or 20% of a selected population. A polymorphic locus may be as small as one base pair. Polymorphic markers include restriction fragment length polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions, minisatellites, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, simple sequence repeats,
and insertion elements such as Alu. The first identified allelic form is arbitrarily designated as the reference form and other allelic forms are designated as alternative or variant alleles. The allelic form occurring most frequently in a selected population is sometimes referred to as the wildtype form. Diploid organisms may be homozygous or heterozygous for allelic forms. A diallelic or biallelic polymorphism has two forms. A triallelic polymorphism has three forms. A single nucleotide polymorphism occurs at a polymorphic site occupied by a single nucleotide, which is the site of variation between allelic sequences . -The site is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 or 1/1000 members of the populations) .
A single nucleotide polymorphism usually arises due to substitution of one nucleotide for another at the polymorphic site. A transition is the replacement of one purine by another purine or one pyrimidine by another pyrimidine. A transversion is the replacement of a purine by a pyrimidine or vice versa. Single nucleotide polymorphisms can also arise from a deletion of a nucleotide or an insertion of a nucleotide relative to a reference allele. Typically the polymorphic site is occupied by a base other than the reference base. For example, where the reference allele contains the base "T" at the polymorphic site, the altered allele can contain a "C", "G" or "A" at the polymorphic site.
Hybridizations are usually performed under stringent conditions, for example, at a salt concentration of no more than 1 M and a temperature of at least 25 °C. For example, conditions of 5X SSPE (750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30°C, or equivalent
conditions, are suitable for allele-specific probe hybridizations. Equivalent conditions can be determined by varying one or more of the parameters given as an example, as known in the art, while maintaining a similar degree of identity or similarity between the target nucleotide sequence and the primer or probe used.
The term "isolated" is used herein to indicate that the material in question exists in a physical milieu distinct from that in which it occurs in nature. For example, an isolated nucleic acid of the invention may be substantially isolated with respect to the complex cellular milieu in which it naturally occurs. In some instances,-" the- isolated material will form part of a composition (for example, a crude extract containing other substances) , buffer system or reagent mix. In other circumstance, the material may be purified to essential homogeneity, for example as determined by PAGE or column chromatography such as HPLC. Preferably, an isolated nucleic acid comprises at least about 50, 80 or 90 percent (on a molar basis) of all macromolecular species present.
I . Novel Polymorphisms of the Invention
The novel polymorphisms of the invention are listed in the Table. The first column of the Table lists the names assigned to the fragments in which the polymorphisms occur. The fragments are all human genomic fragments. The sequence of one allelic form of each of the fragments (arbitrarily referred to as the prototypical or reference form) has been previously published. These sequences are listed at http://www-genome.wi.mit.edu/ (all STS's (sequence tag sites)); http://shgc.stanford.edu (Stanford STS's); and http://ww.tigr.org/ (TIGR STS's). The Web sites also list primers for amplification of the fragments,
and the genomic location of fragments. Some fragments are expressed sequence tags, and some are random genomic fragments. All information in the websites concerning the fragments listed in the Table is incorporated by reference in its entirety for all purposes.
The second column lists the position in the fragment in which a polymorphic site has been found. Positions are numbered consecutively with the first base of the fragment sequence as listed in one of the above databases being assigned the number one. The third column lists the base occupying the polymorphic site in the sequence in the data base. This base is arbitrarily designated -the-- reierence or prototypical form, but it is not necessarily the most frequently occurring form. The fourth column in the Table lists the alternative base(s) at the polymorphic site. The fifth column of the Table lists a 5' (upstream or forward) primer that hybridizes with the 5' end of the DNA sequence to be amplified. The sixth column of the Table lists a 3' (downstream or reverse) primer that hybridizes with the complement of the 3' end of the sequence to be amplified.
The seventh column of the Table lists a number of bases of sequence on either side of the polymorphic site in each fragment . The indicated sequences can be either DNA or RNA. In the latter, the T's shown in the Table are replaced by U's. The base occupying the polymorphic site is indicated in EUPAC-IUB ambiguity code.
II. Analysis of Polymorphisms A. Preparation of Samples Polymorphisms are detected in a target nucleic acid from an individual being analyzed. For assay of genomic
DNA, virtually any biological sample (other than pure red blood cells) is suitable. For example, convenient tissue
samples include whole blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair. For assay of cDNA or mRNA, the tissue sample must be obtained from an organ in which the target nucleic acid is expressed. For example, if the target nucleic acid is a cytochrome P450, the liver is a suitable source.
Many of the methods described below require amplification of DNA from target samples. This can be accomplished by e.g., PCR. See generally PCR Technology: Principles and Applications for DNA Amplifica tion (ed. H.A. Erlich, Freeman Press, NY, NY, 1992) ; PCR Protocols : A Guide to Methods and Applications (eds. Innis,-- et-al . , Academic Press, San Diego, CA, 1990); Mattila et al . , Nuclei c Acids Res . 19, 4967 (1991); Eckert et al . , PCR Methods and Applications 1, 17 (1991); PCR (eds. McPherson et al . , IRL Press, Oxford); and U.S. Patent 4,683,202.
Other suitable amplification methods include the ligase chain reaction (LCR) (see Wu and Wallace, Genomics 4, 560 (1989), Landegren et al . , Science 241, 1077 (1988), transcription amplification (Kwoh et al . , Proc . Na tl . Acad . Sci . USA 86, 1173 (1989)), and self-sustained sequence replication (Guatelli et al . , Proc . Nat . Acad . Sci . USA, 87, 1874 (1990)) and nucleic acid based sequence amplification (NASBA) . The latter two amplification methods involve isothermal reactions based on isothermal transcription, which produce both single stranded RNA (ssRNA) and double stranded DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, respectively.
B. Detection of Polymorphisms in Target DNA
There are two distinct types of analysis of target DNA for detecting polymorphisms. The first type of analysis,
sometimes referred to as de novo characterization, is carried out to identify polymorphic sites not previously characterized (i.e., to identify new polymorphisms) . This analysis compares target sequences in different individuals to identify points of variation, i.e., polymorphic sites. By analyzing groups of individuals representing the greatest ethnic diversity among humans and greatest breed and species variety in plants and animals, patterns characteristic of the most common alleles/haplotypes of the locus can be identified, and the frequencies of such alleles/haplotypes in the population can be determined. Additional allelic frequencies can be determined -for subpopulations characterized by criteria such as geography, race, or gender. The de novo identification of polymorphisms of the invention is described in the Examples section. The second type of analysis determines which form(s) of a characterized (known) polymorphism are present in individuals under test. There are a variety of suitable procedures, which are discussed in turn.
1. Allele-Specific Probes
The design and use of allele-specific probes for analyzing polymorphisms is described by e.g., Saiki et al . , Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment of target DNA from one individual but do not hybridize to the corresponding segment from another individual due to the presence of different polymorphic forms in the respective segments from the two individuals. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe
hybridizes to only one of the alleles. Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15-mer at the 7 position; in a 16-mer, at either the 8 or 9 position) of the probe. This design of probe achieves good discrimination in hybridization between different allelic forms.
Allele-specific probes are often used in pairs, one member of a pair showing a perfect match to a reference form of a target sequence and the other member showing a perfect match to a variant form. Several pairs of probes can then be immobilized on the same support for simultaneous analysis of multiple polymorphisms within the same target sequence .
2. Tiling Arrays
The polymorphisms can also be identified by hybridization to nucleic acid arrays, some examples of which are described in WO 95/11995. One form of such arrays is described in the Examples section in connection with de novo identification of polymorphisms. The same array or a different array can be used for analysis of characterized polymorphisms. WO 95/11995 also describes subarrays that are optimized for detection of a variant form of a precharacterized polymorphism. Such a subarray contains probes designed to be complementary to a second reference sequence, which is an allelic variant of the first reference sequence. The second group of probes is designed by the same principles as described in the Examples, except that the probes exhibit complementarity to the second reference sequence. The inclusion of a second group (or further groups) can be particularly useful for analyzing short subsequences of the primary reference
sequence in which multiple mutations are expected to occur within a short distance commensurate with the length of the probes (e.g., two or more mutations within 9 to 21 bases) .
3. Allele-Specific Primers An allele-specific primer hybridizes to a site on target DNA overlapping a polymorphism and only primes amplification of an allelic form to which the primer exhibits perfect complementarity. See Gibbs , Nucl eic Acid Res . 17, 2427-2448 (1989) . This primer is used in conjunction with a second primer which hybridizes at a distal site. Amplification proceeds from the -two-primers , resulting in a detectable product which indicates the particular allelic form is present. A control is usually performed with a second pair of primers, one of which shows a single base mismatch at the polymorphic site and the other of which exhibits perfect complementarity to a distal site. The single-base mismatch prevents amplification and no detectable product is formed. The method works best when the mismatch is included in the 3 ' -most position of the oligonucleotide aligned with the polymorphism because this position is most destabilizing to elongation from the primer (see, e.g., WO 93/22456).
4. Direct-Sequencing
The direct analysis of the sequence of polymorphisms of the present invention can be accomplished using either the dideoxy chain termination method or the Maxam Gilbert method (see Sambrook et al . , Molecular Cloning, A Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind et al . i Recombinant DNA Laboratory Manual , (Acad. Press, 1988) ) .
5. Denaturing Gradient Gel Electrophoresis Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution. Erlich, ed. , PCR Technology, Principles and Applica tions for DNA Amplification, (W.H. Freeman and Co, New York, 1992), Chapter 7.
6. Single-Strand Conformation Polymorphism Analysis
Alleles of target sequences can be differentia-ted using single- strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, as described in Orita et al . , Proc . Na t . Acad . Sci . 86,
2766-2770 (1989) . Amplified PCR products can be generated as described above, and heated or otherwise denatured, to form single stranded amplification products. Single- stranded nucleic acids may refold or form secondary structures which are partially dependent on the base sequence. The different electrophoretic mobilities of single-stranded amplification products can be related to base-sequence differences between alleles of target sequences .
III. Methods of Use
After determining polymorphic form(s) present in an individual at one or more polymorphic sites, this information can be used in a number of methods.
A . Forensics
Determination of which polymorphic forms occupy a set of polymorphic sites in an individual identifies a set of polymorphic forms that distinguishes the individual. See generally National Research Council, The Evaluation of Forensi c DNA Evidence (Eds. Pollard et al . , National Academy Press, DC, 1996) . The more sites that are analyzed, the lower the probability that the set of polymorphic forms in one individual is the same as that in an unrelated individual. Preferably, if multiple sites are analyzed, the sites are unlinked. Thus, polymorphisms of the invention are often used in conjunction with ~- polymorphisms in distal genes. Preferred polymorphisms for use in forensics are biallelic because the population frequencies of two polymorphic forms can usually be determined with greater accuracy than those of multiple polymorphic forms at multi-allelic loci.
The capacity to identify a distinguishing or unique set of forensic markers in an individual is useful for forensic analysis. For example, one can determine whether a blood sample from a suspect matches a blood or other tissue sample from a crime scene by determining whether the set of polymorphic forms occupying selected polymorphic sites is the same in the suspect and the sample. If the set of polymorphic markers does not match between a suspect and a sample, it can be concluded (barring experimental error) that the suspect was not the source of the sample. If the set of markers does match, one can conclude that the DNA from the suspect is consistent with that found at the crime scene. If frequencies of the polymorphic forms at the loci tested have been determined (e.g., by analysis of a suitable population of individuals) , one can perform a statistical analysis to determine the probability that a
match of suspect and crime scene sample would occur by chance . p(ID) is the probability that two random individuals have the same polymorphic or allelic form at a given polymorphic site. In biallelic loci, four genotypes are possible: AA, AB, BA, and BB . If alleles A and B occur in a haploid genome of the organism with frequencies x and y, the probability of each genotype in a diploid organism is
(see WO 95/12607) : Homozygote: p (AA) = x2
Homozygote: p(BB)= y2 = (1-x)2
Single Heterozygote : p(AB)= p (BA) = xy = x(l-x)
Both Heterozygotes : p (AB+BA) = 2xy = 2x(l-x)-
The probability of identity at one locus (i.e, the probability that two individuals, picked at random from a population will have identical polymorphic forms at a given locus) is given by the equation: p(ID) = (x2)2 + (2xy)2 + (y2)2.
These calculations can be extended for any number of polymorphic forms at a given locus. For example, the probability of identity p(ID) for a 3-allele system where the alleles have the frequencies in the population of x, y and z, respectively, is equal to the sum of the squares of the genotype frequencies : p(ID) = x4 + (2xy)2 + (2yz)2 + (2xz)2 + z4 + y4
In a locus of n alleles, the appropriate binomial expansion is used to calculate p(ID) and p(exc) .
The cumulative probability of identity (cum p(ID)) for each of multiple unlinked loci is determined by multiplying the probabilities provided by each locus. cum p(ID) = p(IDl)p(ID2)p(ID3) .... p(IDn)
The cumulative probability of non-identity for n loci (i.e. the probability that two random individuals will be different at 1 or more loci) is given by the equation: cum p (nonID) = l-cum p(ID) . If several polymorphic loci are tested, the cumulative probability of non- identity for random individuals becomes very high (e.g., one billion to one) . Such probabilities can be taken into account together with other evidence in determining the guilt or innocence of the suspect .
B. Paternity Testing
The object of paternity testing is usually" to~determine whether a male is the father of a child. In most cases, the mother of the child is known and thus, the mother's contribution to the child's genotype can be traced. Paternity testing investigates whether the part of the child's genotype not attributable to the mother is consistent with that of the putative father. Paternity testing can be performed by analyzing sets of polymorphisms in the putative father and the child. If the set of polymorphisms in the child attributable to the father does not match the set of polymorphisms of the putative father, it can be concluded, barring experimental error, that the putative father is not the real father. If the set of polymorphisms in the child attributable to the father does match the set of polymorphisms of the putative father, a statistical calculation can be performed to determine the probability of coincidental match.
The probability of parentage exclusion (representing the probability that a random male will have a polymorphic form at a given polymorphic site that makes him
incompatible as the father) is given by the equation (see WO 95/12607) : p(exc) = xy(l-xy) where x and y are the population frequencies of alleles A and B of a biallelic polymorphic site.
(At a triallelic site p(exc) = xy(l-xy) + yz (1- yz) + xz(l-xz)+ 3xyz (1-xyz) ) ) , where x, y and z and the respective population frequencies of alleles A, B and C) .
The probability of non-exclusion is p(non-exc) = l-p(exc)
The cumulative probability of non-exclusion (representing the value obtained when n loc-i a^re used) is thus : cum p(non-exc) = p (non-excl) p (non-exc2) p (non-exc3 ) .... p(non-excn)
The cumulative probability of exclusion for n loci (representing the probability that a random male will be excluded) cum p(exc) = 1 - cum p(non-exc) . If several polymorphic loci are included in the analysis, the cumulative probability of exclusion of a random male is very high. This probability can be taken into account in assessing the liability of a putative father whose polymorphic marker set matches the child's polymorphic marker set attributable to his/her father.
C. Correlation of Polymorphisms with Phenotypic Traits The polymorphisms of the invention may contribute to the phenotype of an organism in different ways . Some polymorphisms occur within a protein coding sequence and contribute to phenotype by affecting protein structure.
The effect may be neutral, beneficial or detrimental, or both beneficial and detrimental, depending on the
circumstances . For example, a heterozygous sickle cell mutation confers resistance to malaria, but a homozygous sickle cell mutation is usually lethal. Other polymorphisms occur in noncoding regions but may exert phenotypic effects indirectly via influence on replication, transcription, and translation. A single polymorphism may affect more than one phenotypic trait. Likewise, a single phenotypic trait may be affected by polymorphisms in different genes. Further, some polymorphisms predispose an individual to a distinct mutation that is causally related to a certain phenotype.
Phenotypic traits include diseases that ha-ve teiown but hitherto unmapped genetic components (e.g., agammaglobulimenia, diabetes insipidus, Lesch-Nyhan syndrome, muscular dystrophy, Wiskott-Aldrich syndrome,
Fabry's disease, familial hypercholesterolemia, polycystic kidney disease, hereditary spherocytosis, von Willebrand' s disease, tuberous sclerosis, hereditary hemorrhagic telangiectasia, familial colonic polyposis, Ehlers-Danlos syndrome, osteogenesis imperfecta, and acute intermittent porphyria) . Phenotypic traits also include symptoms of, or susceptibility to, multifactorial diseases of which a component is or may be genetic, such as autoimmune diseases, inflammation, cancer, diseases of the nervous system, and infection by pathogenic microorganisms. Some examples of autoimmune diseases include rheumatoid arthritis, multiple sclerosis, diabetes (insulin-dependent and non-independent) , systemic lupus erythematosus and Graves disease. Some examples of cancers include cancers of the bladder, brain, breast, colon, esophagus, kidney, leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach and uterus. Phenotypic traits also include characteristics such as longevity, appearance
(e.g., baldness, obesity), strength, speed, endurance, fertility, and susceptibility or receptivity to particular drugs or therapeutic treatments.
Correlation is performed for a population of individuals who have been tested for the presence or absence of a phenotypic trait of interest and for polymorphic markers sets. To perform such analysis, the presence or absence of a set of polymorphisms (i.e. a polymorphic set) is determined for a set of the individuals, some of whom exhibit a particular trait, and some of which exhibit lack of the trait. The alleles of each polymorphism of the set are then reviewed--to-determine whether the presence or absence of a particular allele is associated with the trait of interest. Correlation can be performed by standard statistical methods such as a K - squared test and statistically significant correlations between polymorphic form(s) and phenotypic characteristics are noted. For example, it might be found that the presence of allele Al at polymorphism A correlates with heart disease. As a further example, it might be found that the combined presence of allele Al at polymorphism A and allele Bl at polymorphism B correlates with increased milk production of a farm animal.
Such correlations can be exploited in several ways . In the case of a strong correlation between a set of one or more polymorphic forms and a disease for which treatment is available, detection of the polymorphic form set in a human or animal patient may justify immediate administration of treatment, or at least the institution of regular monitoring of the patient. Detection of a polymorphic form correlated with serious disease in a couple contemplating a family may also be valuable to the couple in their reproductive decisions. For example, the female partner
might elect to undergo in vitro fertilization to avoid the possibility of transmitting such a polymorphism from her husband to her offspring. In the case of a weaker, but still statistically significant correlation between a polymorphic set and human disease, immediate therapeutic intervention or monitoring may not be justified. Nevertheless, the patient can be motivated to begin simple life-style changes (e.g., diet, exercise) that can be accomplished at little cost to the patient but confer potential benefits in reducing the risk of conditions to which the patient may have increased susceptibility by virtue of variant alleles . Identification -of -a polymorphic set in a patient correlated with enhanced receptiveness to one of several treatment regimes for a disease indicates that this treatment regime should be followed.
For animals and plants, correlations between characteristics and phenotype are useful for breeding for desired characteristics. For example, Beitz et al . , US 5,292,639 discuss use of bovine mitochondrial polymorphisms in a breeding program to improve milk production in cows. To evaluate the effect of mtDNA D-loop sequence polymorphism on milk production, each cow was assigned a value of 1 if variant or 0 if wildtype with respect to a prototypical mitochondrial DNA sequence at each of 17 locations considered. Each production trait was analyzed individually with the following animal model:
Yijkpn= μ + YSi + Pj + Xk + β1 + ... jS17 + PEn + an +ep where Yijknp is the milk, fat, fat percentage, SNF, SNF percentage, energy concentration, or lactation energy record; μ is an overall mean; YSi is the effect common to all cows calving in year-season; Xk is the effect common to cows in either the high or average selection line; β to βxl are the binomial regressions of production record on mtDNA
D-loop sequence polymorphisms; PEn is permanent environmental effect common to all records of cow n; an is effect of animal n and is composed of the additive genetic contribution of sire and dam breeding values and a Mendelian sampling effect; and ep is a random residual. It was found that eleven of seventeen polymorphisms tested influenced at least one production trait. Bovines having the best polymorphic forms for milk production at these eleven loci are used as parents for breeding the next generation of the herd.
D. Genetic Mapping of Phenotypic Traits The previous section concerns identifying correlations between phenotypic traits and polymorphisms that directly or indirectly contribute to those traits. The present section describes identification of a physical linkage between a genetic locus associated with a trait of interest and polymorphic markers that are not associated with the trait, but are in physical proximity with the genetic locus responsible for the trait and co-segregate with it. Such analysis is useful for mapping a genetic locus associated with a phenotypic trait to a chromosomal position, and thereby cloning gene(s) responsible for the trait. See Lander et al . , Proc . Na tl . Acad . Sci . (USA) 83, 7353-7357 (1986); Lander et al . , Proc . Na tl . Acad. Sci . (USA) 84, 2363-2367 (1987); Donis-Keller et al . , Cell 51, 319-337 (1987); Lander et al . , Genetics 121, 185-199 (1989)). Genes localized by linkage can be cloned by a process known as directional cloning. See Wainwright, Med . J. Australia 159, 170-174 (1993); Collins, Nature Genetics 1, 3-6 (1992) .
Linkage studies are typically performed on members of a family. Available members of the family are characterized
for the presence or absence of a phenotypic trait and for a set of polymorphic markers. The distribution of polymorphic markers in an informative meiosis is then analyzed to determine which polymorphic markers co- segregate with a phenotypic trait. See, e . g. , Kerem et al . , Science 245, 1073-1080 (1989); Monaco et al . , Na ture 316, 842 (1985); Yamoka et al . , Neurology 40, 222-226 (1990); Rossiter et al . , FASEB Journal 5, 21-27 (1991). Linkage is analyzed by calculation of LOD (log of the odds) values. A lod value is the relative likelihood of obtaining observed segregation data for a marker and a genetic locus when the two are located at a recombination fraction θ , versus the situation in which the two are not linked, and thus segregating independently (Thompson & Thompson, Genetics in Medicine (5th ed, W.B. Saunders
Company, Philadelphia, 1991) ; Strachan, "Mapping the human genome" in The Human Genome (BIOS Scientific Publishers Ltd, Oxford) , Chapter 4) . A series of likelihood ratios are calculated at various recombination fractions ( θ ) , ranging from θ = 0.0 (coincident loci) to θ = 0.50
(unlinked) . Thus, the likelihood at a given value of θ is: probability of data if loci linked at θ to probability of data if loci unlinked. The computed likelihoods are usually expressed as the log10 of this ratio (i.e., a lod score) . For example, a lod score of 3 indicates 1000:1 odds against an apparent observed linkage being a coincidence. The use of logarithms- allows data collected from different families to be combined by simple addition. Computer programs are available for the calculation of lod scores for differing values of θ (e.g., LIPED, MLINK (Lathrop, Proc . Na t . Acad . Sci . (USA) 81, 3443-3446 (1984)) . For any particular lod score, a recombination fraction may be determined from mathematical tables. See
Smith et al . , Ma thema tical tables for research workers in human genetics (Churchill, London, 1961); Smith, Ann . Hum . Genet . 32, 127-150 (1968) . The value of θ at which the lod score is the highest is considered to be the best estimate of the recombination fraction.
Positive lod score values suggest that the two loci are linked, whereas negative values suggest that linkage is less likely (at that value of θ ) than the possibility that the two loci are unlinked. By convention, a combined lod score of +3 or greater (equivalent to greater than 1000:1 odds in favor of linkage) is considered definitive evidence that two loci are linked. Similarly, by convention, a negative lod score of -2 or less is taken as definitive evidence against linkage of the two loci being compared. Negative linkage data are useful in excluding a chromosome or a segment thereof from consideration. The search focuses on the remaining non-excluded chromosomal locations .
IV. Modified Polypeptides and Gene Sequences The invention further provides variant forms of nucleic acids and corresponding proteins. The nucleic acids comprise one of the sequences described in the Table, column 8, in which the polymorphic position is occupied by one of the alternative bases for that position. Some nucleic acids encode full-length variant forms of proteins. Similarly, variant proteins have the prototypical amino acid sequences encoded by nucleic acid sequences shown in the Table, column 8, (read so as to be in- frame with the full-length coding sequence of which it is a component) except at an amino acid encoded by a codon including one of the polymorphic positions shown in the Table. That position is occupied by the amino acid coded by the
corresponding codon in any of the alternative forms shown in the Table .
Variant genes can be expressed in an expression vector in which a variant gene is operably linked to a native or other promoter. Usually, the promoter is a eukaryotic promoter for expression in a mammalian cell. The transcription regulation sequences typically include a heterologous promoter and optionally an enhancer which is recognized by the host. The selection of an appropriate promoter, for example trp, lac, phage promoters, glycolytic enzyme promoters and tRNA promoters, depends on the host selected. Commercially available expression vectors can be used. Vectors can include host-recognized replication systems, amplifiable genes, selectable markers, host sequences useful for insertion into the host genome, and the like.
The means of introducing the expression construct into a host cell varies depending upon the particular construction and the target host. Suitable means include fusion, conjugation, transfection, transduction, electroporation or injection, as described in Sambrook, supra . A wide variety of host cells can be employed for expression of the variant gene, both prokaryotic and eukaryotic. Suitable host cells include bacteria such as E. coli , yeast, filamentous fungi, insect cells, mammalian cells, typically immortalized, e . g. , mouse, CHO, human and monkey cell lines and derivatives thereof. Preferred host cells are able to process the variant gene product to produce an appropriate mature polypeptide. Processing includes glycosylation, ubiquitination, disulfide bond formation, general post-translational modification, and the like.
The protein may be isolated by conventional means of protein biochemistry and purification to obtain a substantially pure product, i . e . , 80, 95 or 99% free of cell component contaminants, as described in Jacoby, Methods in Enzymology Volume 104, Academic Press, New York (1984); Scopes, Protein Purifica tion, Principles and Practice, 2nd Edition, Springer-Verlag, New York (1987); and DeuLscher (ed) , Guide to Protein Purifica tion, Methods in Enzymology, Vol. 182 (1990). If the protein is secreted, it can be isolated from the supernatant in which the host cell is grown. If not secreted, the protein can be isolated from a lysate of the host cells.
The invention further provides transgenic nonhuman animals capable of expressing an exogenous variant gene and/or having one or both alleles of an endogenous variant gene inactivated. Expression of an exogenous variant gene is usually achieved by operably linking the gene to a promoter and optionally an enhancer, and microinjecting the construct into a zygote . See Hogan et al . , "Manipulating the Mouse Embryo, A Laboratory Manual," Cold Spring Harbor Laboratory. Inactivation of endogenous variant genes can be achieved by forming a transgene in which a cloned variant gene is inactivated by insertion of a positive selection marker. See Capecchi, Science 244, 1288-1292 (1989) . The transgene is then introduced into an embryonic stem cell, where it undergoes homologous recombination with an endogenous variant gene. Mice and other rodents are preferred animals. Such animals provide useful drug screening systems . In addition to substantially full-length polypeptides expressed by variant genes, the present invention includes biologically active fragments of the polypeptides, or analogs thereof, including organic molecules which simulate
the interactions of the peptides. Biologically active fragments include any portion of the full-length polypeptide which confers a biological function on the variant gene product, including ligand binding, and antibody binding. Ligand binding includes binding by nucleic acids, proteins or polypeptides, small biologically active molecules, or large cellular structures.
Polyclonal and/or monoclonal antibodies that specifically bind to variant gene products but not to corresponding prototypical gene products are also provided. Antibodies can be made by injecting mice or other animals with the variant gene product or synthetic -peptide- fragments thereof. Monoclonal antibodies are screened as are described, for example, in Harlow & Lane, Antibodies , A Labora tory Manual , Cold Spring Harbor Press, New York (1988) ; Goding, Monoclonal antibodies, Principles and Practice (2d ed.) Academic Press, New York (1986) . Monoclonal antibodies are tested for specific immunoreactivity with a variant gene product and lack of immunoreactivity to the corresponding prototypical gene product . These antibodies are useful in diagnostic assays for detection of the variant form, or as an active ingredient in a pharmaceutical composition.
V. Kits The invention further provides kits comprising at least one allele-specific oligonucleotide as described above. Often, the kits contain one or more pairs of allele- specific oligonucleotides hybridizing to different forms of a polymorphism. In some kits, the allele-specific oligonucleotides are provided immobilized to a substrate. For example, the same substrate can comprise allele- specific oligonucleotide probes for detecting at least 10,
100 or all of the polymorphisms shown in the Table. Optional additional components of the kit include, for example, restriction enzymes, reverse-transcriptase or polymerase, the substrate nucleoside triphosphates , means used to label (for example, an avidin-enzyme conjugate and enzyme substrate and chromogen if the label is biotin) , and the appropriate buffers for reverse transcription, PCR, or hybridization reactions. Usually, the kit also contains instructions for carrying out the methods. The following Examples are offered for the purpose of illustrating the present invention and are not to be construed to limit the scope of this invention-.- T e teachings of all references cited herein are hereby incorporated herein by reference.
EXAMPLES
The polymorphisms shown in the Table were identified by resequencing of target sequences from three to ten unrelated individuals of diverse ethnic and geographic backgrounds by hybridization to probes immobilized to microfabricated arrays or conventional sequencing. The strategy and principles for design and use of such arrays are generally described in WO 95/11995. The strategy provides arrays of probes for analysis of target sequences showing a high degree of sequence identity to the reference sequences of the fragments shown in the Table, column 1. The reference sequences were sequence-tagged sites (STSs) developed in the course of the Human Genome Project (see, e . g . , Science 270, 1945-1954 (1995); Nature 380, 152-154 (1996)). Most STS's ranged from 100 bp to 300 bp in size. A typical probe array used in this analysis has two groups of four sets of probes that respectively tile both strands of a reference sequence. A first probe set
comprises a plurality of probes exhibiting perfect complementarily with one of the reference sequences. Each probe in the first probe set has an interrogation position that corresponds to a nucleotide in the reference sequence. That is, the interrogation position is aligned with the corresponding nucleotide in the reference sequence, when the probe and reference sequence are aligned to maximize complementarily between the two. For each probe in the first set, there are three corresponding probes from three additional probe sets. Thus, there are four probes corresponding to each nucleotide in the reference sequence. The probes from the three additional probe -sets aaee identical to the corresponding probe from the first probe set except at the interrogation position, which occurs in the same position in each of the four corresponding probes from the four probe sets, and is occupied by a different nucleotide in the four probe sets. In the present analysis, probes were 25 nucleotides long. Arrays tiled for multiple different references sequences were included on the same substrate.
Multiple target sequences from an individual were amplified from human genomic DNA using primers for the fragments indicated in the listed Web sites. The amplified target sequences were fluorescently labelled during or after PCR. The labelled target sequences were hybridized with a substrate bearing immobilized arrays of probes. The amount of lable bound to probes was measured. Analysis of the pattern of label revealed the nature and position of differences between the target and reference sequence. For example, comparison of the intensities of four corresponding probes reveals the identity of a corresponding nucleotide in the target sequences aligned with the interrogation position of the probes. The
corresponding nucleotide is the complement of the nucleotide occupying the interrogation position of the probe showing the highest intensity (see WO 95/11995) . The existence of a polymorphism is also manifested by differences in normalized hybridization intensities of probes flanking the polymorphism when the probes hybridized to corresponding targets from different individuals. For example, relative loss of hybridization intensity in a "footprint" of probes flanking a polymorphism signals a difference between the target and reference (i.e., a polymorphism) (see EP 717,113) . Additionally, hybridization intensities for corresponding targete-s from different individuals can be classified into groups or clusters suggested by the data, not defined a priori , such that isolates in a give cluster tend to be similar and isolates in different clusters tend to be dissimilar. Hybridizations to samples from different individuals were performed separately. The Table summarizes the data obtained for target sequences in comparison with a reference sequence for the individuals tested.
From the foregoing, it is apparent that the invention includes a number of general uses that can be expressed concisely as follows. The invention provides for the use of any of the nucleic acid segments described above in the diagnosis or monitoring of diseases, such as cancer, inflammation, heart disease, diseases of the CNS, and susceptibility to infection by microorganisms. The invention further provides for the use of any of the nucleic acid segments in the manufacture of a medicament for the treatment or prophylaxis of such diseases. The invention further provides for the use of any of the DNA segments as a pharmaceutical.
All publications and patent applications cited above are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication or patent application were specifically and individually indicated to be so incorporated by reference
)
0
U>
ATTGCACTGAAG I I I I I GAAATACCTTTGTAGTTACTCAAGCAGTTACTCCCTACACTGATGCAAGGA TTACAGAAACTGATGCCAAGGGGtC/G]TGAGTGAGTTCAACTACATGTTCTGGGGGCCCGGAGATAG ATGACTTTGCAGATGGAAAGAGGTGAAAATGAAGAAGGAAGCTGTGTTGAAACAGAAAAATAAGTC
WI-7718C 91 G AAAAGGAACAAAAATTACAAAGAACCATGCAGGAAGGAAAACTATGTATTAAT
ATTGCACTGAAG I I I I I GAAATACCTTTGTAGTTACTCAAGCAGTTACTCCCTACACTGATGCAAGGA TTACAGAAACTGATGCCAAGGGGCTGAGTGAGTTCAACTACATGTTCTGGGGGCCCGGAGATAGATG ACTTTGCAGATGGAAAGAGGTGAAAATGAAGAAGGAAGCTGTGTTGAAACAGAAAAATAAGTCAAA
Wl-7718b 248 AGGAACAAAAATTACAAAGAACCATGCAGGAAGGAAAACTATGTATT[A/G1AT
ATrGCACTG GTTTTTGAAATACCTTTGTAGTTACTCAAGC[A/C,ηGTTACTCCCTACACTGATGC AAGGATTACAGAAACTGATGCCAAGGGGCTGAGTGAGTTCAACTACATGTTCTGGGGGCCCGGAGAT AGATGACTTTGCAGATGGAMGAGGTGAAAATGAAGAAGGAAGCTGTGTTGAAACAGAAAAATAAG
Wl-7718a 42 TCAAAAGGAACAAAAATTACAAAGAACCATGCAGGAAGGAAAACTATGTATTA
AGGGAATTGTGTTGCTCCTGGAGGAAGCCCAGGCATCATTAAACAAGCCAGTAGGTCACCTGGCTTC CGTGGACCAATTCATCTTTCAGACAAGCTTTA[G/C]AGAAATGGACTCAGGGAAGAGACTCACATGC TTTGGTTAGTATCTGTGTTTCCGGTGGGTGTAATAGGGGATTAGCCCCAGAAGGGACTGAGCTAAACA
Wl-7227d 99 GTGTTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATG Ul
00
AGGGAATTGTGTTGCTCCTGGAGGAAGCCCAGGCATCATTAAACAAGCCAGTAGGTCACCTGGCTTC CGTGGACCAATTCATCTTTCAGACAAGCTTTAGAGAAATGGACTCAGGGAAGAGACTCACATGCTTT GGTTAGTATCTGTGTTTCCGGTGGGTGTAATAGGGGATTAGCCCCAGAAGGGACTGAGCTAAACAGTG
WI-7227C 291 TTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATGCAAT
AGGGAATTGTGTTGCTCCTGGAGGAAGCCCAGGCATCATTAAACAAGCCAGTAGGTCACCTGGCTTC CGTGGACCAATTCATCTTTCAGACAA[G/ηCTTTAGAGAAATGGACTCAGGGAAGAGACTCACATGC TTTGGTTAGTATCTGTGTTTCCGGTGGGTGTAATAGGGGATTAGCCCCAGAAGGGACTGAGCTAAACA
Wl-7227b 93 GTGTTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATG
AGGGAATTGTGTTGCTCCTGGAGG[A G]AGCCCAGGCATCATTAAACAAGCCAGTAGGTCACCTGGC TTCCGTGGACCAATTCATCTTTCAGACAAGCTTTAGAGAAATGGACTCAGGGAAGAGACTCACATGC TTTGGTTAGTATCTGTGTTTCCGGTGGGTGTAATAGGGGATTAGCCCCAGAAGGGACTGAGCTAAACA
Wl-7227a 24 G GTGTTATTATGGGAAAGGAAATGGCATTGCTGCTTTCAACCAGCGACTAATG j CCACAATGCCTCTCCCACGATGTCAAGGACTCCTGTCTGTCCTGGAGGTGGGAGACAAGGAACCTCCG | AAGAGGAAGCAAGAAAGCCGTACTGTCTATGTTGTGATCCTTCATCGAACAAACTGATGCGAAAACT |TGAATCTGTTACTGAAATGAGGAGAGAAGGACATGTGCTATTGAACTGAGCCAAACACACTGTAAAT
Wl-7310b 234>A ATCCACAGACTCCCTCCCCTGCCCCCATCCCAfA/CIATGATCTTGAGATTTC
)
GAAGCAACCAGAAAGTATCTTTATCCCCATCTAGATTATGTCTGGGTTCTTCCAGACTCCTACGATTA AATTGTATGCATGTGAACAACTGATGAGGTACTTAGATCTCAGTGCTTTGCAGAAAGAAAAG[T/C]C GTCTACCATTTTCACCAAATTTCGTAGTACAATTTAAGTATCTCTTGTTATCTCCCCTAGGAGTCTAA
Wl-1 95b 1 30 AGTGAGCTGGGGAAGGCAGGATTT
GAAGCAACCAGAAAGTATCTTTATCCCCATCTAGATTATGTCTGGGT[T/C]CTTCCAGACTCCTACGA TTAAATTGTATGCATGTGAACAACTGATGAGGTACTTAGATCTCAGTGCTTTGCAGAAAGAAAAGTC GTCTACCATTTTCACCAAATTTCGTAGTACAATTTAAGTATCTCTTGTTATCTCCCCTAGGAGTCTAA
Wl-1795a 47 AGTGAGCTGGGGAAGGCAGGATTT
CACACAATTTGCAAACACTTCAAAGTGAACGCCCGACATCATCAGCCCGTTAACGTCCAGGCCATGT CCCACATAGAGAACGCTTTACTTCCACGTCTCTCCATACGTAGGTCCTGGTCTCCTATCACATTGCCA C[G/A]TAGCCCTCCCTTCCCTTCCCCCTACAGGCCCTCTTCAGGGCCCCAGTCCCCCTCTGAGACTCCC
1 36 ATGGATCATTCCTGTTTCTGTATCAGGCAGTGATTTAACTCC I I I I I I GT
CACACAATTTGCAAACACTTCAAAGTGAACGCCCGACATCATCAGCCCGTTAACGTCCAGGCCATGT CCCACATAGAGAACGCTTTACTTCCACGTCTCTCCATACGTAGGTCCTGGTCTCCTATCACATTGCCA C[G/A]TAGCCCTCCCTTCCCTTCCCCCTACAGGCCCTCTTCAGGGCCCCAGTCCCCCTCTGAGACTCCC
1 36 ATGGATCATTCCTGTTTCTGTATCAGGCAGTGATTTAACTCC I I I I I I GT
4-.
CACACAATTTGCAAACACTTCAAAGTGAACGCCCGACATCATCAGCCCGTTAACGTCCAGGCCATGT CCCACATAGAGAACGCTTTACTTCCACGTCTCTCCATACGTAGGTCCTGGTCTCCTATCACATTGCCA CGTAGC[C/ηCTCCCTTCCCTTCCCCCTACAGGCCCTCTTCAGGGCCCCAGTCCCCCTCTGAGACTCCC
1 41 ATGGATCATTCCTGTTTCTGTATCAGGCAGTGATTTAACTCC I I I I I I GT
CACACAATTTGCAAACACTTCAAAGTGAACGCCCGACATCATCAGCCCGTTAACGTCCAGGCCATGT CCCACATAGAGAACGCTTTACTTCCACGTCTCTCCATACGTAGGTCCTG[G/CJTCTCCTATCACATTG CCACGTAGCCCTCCCTTCCCTTCCCCCTACAGGCCCTCTTCAGGGCCCCAGTCCCCCTCTGAGACTCCC
1 1 6 ATGGATCATTCCTGTTTCTGTATCAGGCAGTGATTTAACTCC I I I I I I GT
CTCTTATTTCTCTGGGCACTGCTTTCTTTGGGGGCAAACTTCCAGTATCACT[G/A]ATACTAATATAA AAACCCTGT GTCTGCTTGCATTTTCAAGATTCAATATATATCCAGATTGTTTTCCCAGCAAAGAA TTTTATTTCTCAAGATATAAAAAATMATATTTAATTTCAGTTTCCTCAAAAGGAATATGAAATT
WI-1126C 52 G TGTTAAAATGCAAATCCAGCTGTAAC I I I I I I GGACTTGTCTTTTATTTCTT
CTCTTATTTCTCTGGGCACTGCTTTCTTTGGGGGCAAACTTCCAGTATCACTGATACTAATATAAAAA CCCTGTMGTCTGCTTGCATTTTCAAGATTCAATATATATCCAGATTGTTTTCCCAGCAAAGAAAATT TTATTTCTCAAGATATAAMMTMATATTTMTTTCAGTTTCCTCMAAGGAATATGMATTTGTT
Wl-1126b 230 AAAATGCAAATCCAGCTGT CTTTTT[T/C|GGACTTGTCTTTTATTTCTT
4-.
CGAGCTTGGGATAAAGCAAGGGGACCTTGGC[G/A]CTCTCAGCTTTCCCTGCCACATCCAGCTTGTTG TCCCAATGAAATACTGAGATGCTGGGCTGTCTCTCCCTTCCAGGAATGCTGGGCCCCCAGCCTGGCCA GACMGMGACTGTCAGGMGGGTCGGAGTCTGTAAMCCAGCATACAGTTTGGCTTTTTTCACATT
Wl-7038a 31 G . GATCA I I I I l ATATGAAATAAAAAGATCCTGCATTTATGGTGTAGTTCTGA
ATACGCTTTCTGTCTGTCCCACAGTGGAACCAGCACCCAGGTGGCCAGGGTCGGGCTCCACACA[G η CCCTCAGCCCCTTCAGCTTTGCATGTGTCCATCGGTGACTCAGCACAGAGTTTTCCAACCTCATGTGA CAAAAATACAGATTCCCAGTCTCCTCTCCTGGATTTGGATCTAGCAAGACCAGAGACGGTCCTAGAA
Wl-3429b 64 TCCTGACTGTTAACAAGCACTCCAGGCAATTCTTAAGACCAAGCACGGAGC
ATACGCTTTCTGTCTGTCCCACAGTGGAACCAGCACCCAGGTGGCCAGGGTCGGGCTCCACA[C/ηAG CCCTCAGCCCCTTCAGCTTTGCATGTGTCCATCGGTGACTCAGCACAGAGTTTTCCAACCTCATGTGA CAAAAATACAGATTCCCAGTCTCCTCTCCTGGATTTGGATCTAGCAAGACCAGAGACGGTCCTAGAA
Wl-3429a 62 TCCTGACTGTTAACAAGCACTCCAGGCAATTCTTAAGACCAAGCACGGAGC
ATTTTAGGACAGTGAAAAAAAGGGATTTATAAATAAAATCTATGCCATCCAGGAGGTATGTGTCAGT GTCCAGAACATCCTAGATGAAGTGGCTTCCTTTGGCGAAAGGATAAAGAAGTGAGTGACGGTGACCT GTGAGCCCCATTCTTCT[G/A]TGGGATAAGGTGTCCATTTGTTTCTTGGAGGGTGAAATGCCACATTC
WI-6786C 1 51 TTTTTGGCAGGGGACACTCCTTCTGGGTGCTCTATTGCTCAGTTTCATCATT 4-.
ATTTTAGGACAGTGAAAAAMGGGATTTATAAATAAAATCTATGCCATCCAGGAGGTATGTGTCAGT GTCCAGAACATCCTAGATGAAGTGGCTTCCTTTGGCGAAAGGAT[A/ηAAGAAGTGAGTGACGGTGA CCTGTGAGCCCCATTCTTCTGTGGGATAAGGTGTCCATTTGTTTCTTGGAGGGTGAAATGCCACATTC
Wl-6786b 1 1 1 TTTTTGGCAGGGGACACTCCTTCTGGGTGCTCTATTGCTCAGTTTCATCATT
ATTTTAGGACAGTGAAAAAAAGGGATTTATAAATAAAATCTATGCCATCCAGGAGGTATGTGTCAGT GTCCAGAACATCCTAGATGAAGTGGCTTCCTTTGGCGAA[A/ηGGATAAAGAAGTGAGTGACGGTGA CCTGTGAGCCCCATTCTTCTGTGGGATAAGGTGTCCATTTGTTTCTTGGAGGGTGAAATGCCACATTC
Wl-6786a 1 06 TTTTTGGCAGGGGACACTCCTTCTGGGTGCTCTATTGCTCAGTTTCATCATT
GGCTATTTGTAAATGCTTGGTTATTTGACTCCAAAATTGAATAAGTATTGGGGAAGAATCCCTCACCT ACTTCCAMTCCCTTACATATCMTTTTACACAAAGCCCCTAAACCTTCAGTTCCAATCACTCTGAAT TTCATATACCTCCATTATTAAATTCAATACATCATTGCAGAGAAAAGACAACGGTGCCAACTGGGTT
Wl-671 1 b 226 T TGGTTGGTGCCTGCACACCCACA[G ηTGGCAACTAAGTGTAATCTCTAAA
GGCTATTTGTA TGCTTGGTTATTTGACTCCAAAA[T/C]TGAATAAGTATTGGGGAAGAATCCCTC
ACCTACTTCCA TCCCTTACATATC TTTTACACAAAGCCCCTAAACCTTCAGTTCCAATCACTCT GAATTTCATATACCTCCATTATTAAATTCAATACATCATTGCAGAGAAAAGACAACGGTGCCAACTG
Wl-6711 a 36 T Ci - GGTTTGGTTGGTGCCTGCACACCCACAGTGGCAACTAAGTGTAATCTCTAAA
4-.
4-. ^1
4-.
00
4-.
Λ
O
CCTCCTCTGAG GCAGTTCTCTGAAAGACMTGGATTGTGGAGCATACTGMGACTATTCCTAMTGGCTATTTGTGTTG
TTTGTGTTGGG ATTTTCTGAAT GGTGGTCMG[A G]CTATTCAGAAMTCTCAGAGGAGGACAMTGATAGTGCACTGCAGCCAGCTCG
WI-11909 78 G TGGTCMG_ AG GACTGGCTTGCAAGAGTC
TCCTGTAMGC
CATGAAGAGT CAATTTTATAT AAAAATACCATTTAGCATCMTTGCCCCMGTTTGGCAGGCATGMGAGTGGGCAGTTCAΓT/G]GTT
WI-1 1806 60 GGGCAGTTCA ACTAATAA TTATTAGTATATMAATTGGCTTTACAGGMGCATTATGG
CCCTAGTGMTACMCCTTTGTCCTGGAGAC[C/A]CCAGCTAGTCTMGAAMCTTCCTAGGCTGAG
WI-11946 31 CTCTCTTGGGMTCTMGATAMGMCTGAGATCCTGGGMGMGGGM
TGMGATCAG
ATCTCTGGTTT CAGCTGTGGTG ACAAAATTCACMGTACAACACTGCTTATTTTCTTGCTTGMGATCAGATCTCTGGTTTATTTM[T/
WI-11965 65 G ATTT MTGTTGAT G]ATCMCATTCACCACAGCTGMGGAAATTAMCTGMCCT
TGCCCTACTAC TGAGGAAATGT ACCTATTTTGMACTGCAGAAAGGGCAGGACAAMCAAATCACTTCATAGATTTTTCTGGGAMTAT
GCTTTTAAAA GTTACAGTATT TGCCCTACTACGCTTTTAAMAA[T/A]AATAAAAATACTGTAACACATTTCCTCATTTCTCTTACGA
WI-11027 90 A TTTATT ATACTTTC I I I I I GATATTGCAMTTCTATGGCATACACAGAGGCACCTCCTCMTGCCCTG
TTCTGCTGAAGATCACAAMCMTTTCMCCTCTGTGGTTCAAMTMTTTMGGATCTTGTACCTTT
GTGTTTATTTTCTGTTTCAACTAAGGA[C/ηAGACTTCAGMGGCATAGCTTCCCTTGTAACGTTTTT
WI-1 1049 95 AAACATCTTTTTCATTTGTAGGAAGGMCATTTCAAAAGCCCAA CΛ )
AAAAGGACAG TTTCCATCTTA CCAGATATCA TTTCATTTCTG CAACATTTATCAAACATGGTAGGGAAMGTTCTCACTCTGCACTATAAAAAGGACAGCCAGATATCA
WI-15488 69 AC TMC AC[CtT]GTTACAGAAATGAAATAAGATGGAAAATm AACAAATTG
MCAGTTAAT GAMCACATC GGCTGGTGAM TGCTCAATTTMTGTGATAATCTCCMCAGTTMTGAAACACATCCGTA[A/G]GTATGACATCATTT
WI-13654 49 CGT TGATGTCAT CACCAGCCAGCTACTTCATGTGGCAGAAMGGTMCCTTTTCCCCATTTTACAGACAAMCCAGT _
ATGAGACCCTGCTTTGMCGTTAMCGTTTTGGMTMTGGAAAAGGAGCTAGGACMTTCTTGCTT
Wl- TCMGTAAMTTGTGACTGAGCAGAAMTCAGCCAGCTATCTTGGGTGCAGAGAGGTACTCCMGTA 1 1 070b 1 3_5j C| C[C, ]GTGGGGGTTCTGATGACTTCCACGGTCACTGGGGATCCMCAGMGGGM
CAGAAMTCA ATGAGACCCTGCTTTGMCGTTAMCGTTTTGGMTMTGGAAMGGAGCTAGGACMTTCTTGCTT
Wl- GCCAGCTATCT TFGGAGTACCT TCMGTAAMTTGTGACTGAGCAGAAMTCAGCCAGCTATCTT[G/ηGGTGCAGAGAGGTACTCCM 1 1070a 1 1 0 | G T T CTCTGCACC GTACCGTGGGGGTTCTGATGACTTCCACGGTCACTGGGGATCCMCAGMGGGM
MTCTTTTATATTTCCAGCTGTTGAGACAGTATTTTTGAGGGCTGATGTTACCTCTAGCGGCGAMCC AGAGCCAGCTATTMGCAGCCAGMAGCTACAGTMTTGMTACATGACCATT[T/C]CTCTTTTAGC
WI-12020 1 21 T C -- ACGTTCTTTGTTCTCCTC
Λ
4-.
/l
CΛ
00
O
ON 4-.
ON 00
-4
O
TGCTC I I I I I ATTTCACGTTTCACMCACACGCCGTG[G/ηTGGCACAGTCTACCAMGTGCCCGCAG CGCCACGCTTGGGCCGGMGGTCTCATTCTGTTCGTCTCTATGGACTGATTGMTTTGGGATGGCCAG CTCCAGMTGTTCCACGTGGGGGCACTCTGTGGGCAGAGAGGCTGAGCCCTTGCCCACACTGGCACCA
WI-9617 37 G MGAGGTTGCACGATGCAGCTTGCAGTGGGTCCMGCCGGGTGTGCTGTG
MTGCTGGAGAAMCATCMCATTGAGTTGACATTTGTTTTGCTGMGTATAGCTACCATCCACTAT CATGAATTTTTGTTTCATTACAMTGATAGAAMGCCAGATTCTCAAAATAAAG[T/G]ATMTTCTT TGTATTAAATAMTGTTTATAAATGTTTATGAAGCTCATTACATTATC I I I I I I AAMAAGTAAAAA
WI-9657 1 21 TTTTAGMCATATGACGCTTTTCATMTTMTGCTTTTGATATAGATTTGAGG
AAAAATTAAC CAGGGTCTTGCTCTGTCTCCCAGGCTAGAGTGAGGTGACACMTCMGACTCACAGTAGCCTCMCCT
Wl- CCTCCCMGTA CAGGTGTGGTG CCTATGCTCMGCCAGCCTCCCMGTAGCTGGGACTACAGGCATGT[G/C]ACACCACACCTGGTTM 131 19b 1 14ι G ' GCTGGGA T I I I l I I IM I I I I I l GTMAGATAGGGTCTCACTATGTTGCCCCGTCTCAAAAMCMACCMCTMC
CAGGGTCTTGCTCTGTCTCCCAGGCTAGAGTGAGGTGACACMTCMGACT[C/G]ACAGTAGCCTCA ACCTCCTATGCTCMGCCAGCCTCCCMGTAGCTGGGACTACAGGCATGTGACACCACACCTGGTTA
Wl- A l I I I I I I MI I I I I I GTAMGATAGGGTCTCACTATGTTGCCCCGTCTCAAAMACAMCCAACTAA 131 19a 51 C _
ACAGGAATCTGAAAGTTACCMGGCAATTTTCCCTTTTAGGATCATAMGACTACAGACTTMGCTT 3
TCATAMGAC TTAGAAATTTT TTTT[C/T]C I I I I I CCATATMTACACAAAATTTCTAAATATCCTTAAAAAAGAAAATATAAATAGT TACAGACTTA GTGTATTATAT TTCAGTATGTTATGTAGAGTCACATACTATGGCAAMATATTTTATTMTTGAGGGMTAGGCCMT
WI-13112 71 AGC I I I I I GGAAAMG
TGTTAACATTTTTATTGGTACGTGCTCTCAGTACM[C/A]AMCAGCATCAGTAGTGTACACTTTGAT
CAMGTGTACA MAMGGMTTTTTAGCTTAGTAGMMGMAGCCCAMGGTCAGAAGTATMTGMTATGTACAT
TGGTACGTGCT CTACTGATGCT CTTTATGGAMCTGTTTGTGTGACCATCTTTATCTTCCCCTGTGGATGAGATGTATGCACACACMGT
WI-12988 36 CTCAGTACM GTTT AAA
TGCTATTCATGACAGACACGTGAGACAMTATTCTTATTTTACAGATGGAMTAGACCCAGACATTA
CTMTAGTGG TTCAGTACTTTMCCACTAATAGTGGMCCCTGAGACTTTA[G/A]ATCTGCAMGGGGTTTAATAAT
Wl- MCCCTGAGA CATTATTAAAC GCMATATCACATATATTTCCA I I I I I AACACCATATTTAAGTTTTCCATTTTCTTMTAGAMATGA 13020a 1 08 GL CTTT CCCTTTGCAGA TAMAAATGTTTTCCCCMTAT
TGTATAAAMATCCMCTTGTTCCACMGTACATATGTCCTATGATTTTATGCATACATCCATATAC
CCATATACAT ATATATCAAGGTMAGTCCA[A/G]TACAMMMCAGCATTTCCTATGGCCAGTGTFCTACAGAAGT ATATCMGGT GCCATAGGM MGACTGTGCAMCTTTATCGTATAGTCAMTGAGATTGCACACTMGGCAGGATGAGGCAGMGCA
WI-12837 87' MAGTCCA ATGCTG I I I I I AGTTGTGTCCA
GTCCTCAGGCCCTTCTCTGGCTGCAGAGCCGTCTTCTCAGGTTGCCTGTC[G/C]TCTCCTGGCCTCTAG TCTTCCCTGCTCTCCGAGGTAGAGCTGGGTATGGATGCTTAGTGCCCTCACTTCTCTCTGTCTATACCT GCCCCATCTGAGCACCCATTGCTCACCATCAGATCMCCTTTGATΠTACATCATMTGTATTCACCA
L4261 1 b 50 CTGGAGCTTCACTTTGTTAC
GTCCTCAGGCCCTTCTCTGGCTGCAGAGCCGTCT[T/C]CTCAGGTTGCCTGTCGTCTCCTGGCCTCTAG TCTTCCCTGCTCTCCGAGGTAGAGCTGGGTATGGATGCTTAGTGCCCTCACTTCTCTCTGTCTATACCT GCCCCATCTGAGCACCCATTGCTCACCATCAGATCMCCTTTGATTTTACATCATMTGTATTCACCA
L4261 1 34 T CTGGAGCTTCACTTTGTTAC
TGAACGTGTGGTTAAAACTAGGCMTTGGTTMMATCMTTTMMAACAGGCCTAGAMCAGTG
TGMGAAATG ACCACACCTCMGCAATGATTATCCCTAGCACTCAGATTATGTTCTTGAMTACCATTTTCTGCTTTC GCTGATACCA ATGTGCATTTT AAMGAAAGACATGAGGGCTTCTTGMGAMTGGCTGATACCMG[CtηCTGCAGTGAAAMTGCA
Wl-1172b 1 79 A TCACTGCAG CATGATGAGCCTGGMCATGTTGT
TGAACGTGTGGTTMAA[C/A]TAGGCAATTGGTTAAAAATCAATTTAAMAACAGGCCTAGAAACA GTGACCACACCTCMGCMTGATTATCCCTAGCACTCAGATTATGTFCTTGAMTACCATTTTCTGCT TTCAAMGMAGACATGAGGGCTTCTTGMGMATGGCTGATACCMGCCTGCAGTGAAMATGCA
Wl-1 172a 1 7 CATGATGAGCCTGGMCATGTTGT
AGAGGCAGATTGGAAGTGTGAAAAAAATGAAAGM[G/C]MGMAAAAAGAGTCTAAATATTCAG 4-.
GCAGATTGGA CACTTACATTT MATGTMGTGCTGCCCTCMCTGTTCTTTACCCACTTMTTCTGCMTTTTGAAAACTAGATTGMT AGTGTGAAM CTGAATATTTA TCCTTTGCAAMCCCTTGCATCATGGATACCCGAGTTAMCCGTTMTTAAMGACATTAMCATGG
WI-1 177 35 G O A GACTCTTT CCTGGTG
TCCATGGTTTGGTTGCTACTGACTTTGTTAGCCTTACTGCCCACTATGCATTGGMCATTCCCATATTC CMCTMGCAGGAGTGTTCACMTM.ACMCATAGGCTCTTTATTCTCCTTCTTTCATTMTTTTCTT TCAC[GyA]TTATTCCCTCACCCTGMCGCCCTTCTTCCTTCGTAGTGACATTTTAAMTCCACTTTAC
Wl-1231 b 1 41 I G ACATTCGGACC
TCCATGGTTTGGTTGCTACTGACTTTGTTAGCCTTACTGCCCACTATGCATTGGMCATTCCCATATTC
GGCTCTTTATT CMCTAAGCAGGAGTGTTCACMTMACMCATAGGCTCTTFATTCTCCTTCTTTCA[T/C]TMTTTT
CTCCTTCTTTC CGTTCAGGGTG CTTTCACGTTATTCCCTCACCCTGMCGCCCTTCTTCCTTCGTAGTGACATTTTAAMTCCACTTTACA
Wl-1231 a 1 26 T!C A_ __ _ AGGGAATM CATTCGGACC
ACATACATAT GMGGCAGGACTGTGTTTTGGAGGACMMAGTAAMTC I I I I l ATATCTTTA I I I I I I MTTTTATT [CCATTATACA GACCTTTCTTT TTTTTTCAGGCATATAGACATACATATCCATTATACMCAGMMG[G/C]GGGCTGGAAAAGMAG
WI-472 1 1 4 G C ACAGAMAG TCCAGCCC GTCMGTGAGATTTCAGATATTCTTAAATGCMGGCTGACAMTTTGGGCTTGATT
CΛ
-4 -4
oo
vo
09
©
00
00 SI
oo
00 4-.
00 ON
00 -4
00
00
00 VO
vo o
VO
SI
VO
vo
CΛ
VO ON
TΓTTTGTTTΌCTCTGGACACCCACTGCTCCCAGGATGAMGGAGAG[G/A]MTGAGATCAGTTTTGGA
WI-7593 46 CACTTCCTCTTGAMTATAMGMTCMCMGTFACAGTCATGTTGGGGACTTCTTCTCTCTCCM
AGTGCATCTTGGGGGAMGGGCTCCAGTGTTATCTGGACCAGTTCCTTCATΠTCAGGTGGGACTCTT GATCCAGAGA[A G]GACAMGCTCCTCAGTGAGCTGGTGTATMTCCMGACAGMCCCMGTCTCC TGACTCCTGGCCTTCTATGCCCTCTATCCTATCATAGATAACATTCTCCACAGCCTCACTTCATTCCAC
WI-6962 78 CTATTCTCTGAAMTATTCCCTGAGAGAGMCAGAGAGATTTAGATMGA
GCAGAGMGAGMCCATGCCAGGGGAGMGGCACCCAGCCATC[C/G]TGACCCAGCGAGGAGCCM
MGGCACCCA GCTCCTCGCTG CTATCCCAMTATACCTGGGTGMATATACCAAATTCTGCATCTCCAGAGGMMTMGAMTMA
WI-7059 43 GCCATC GGTCA GATGAATTGTTGCAACTCTTAAAAAM
CACTTCACTGA MGACACCAT TCTACTTTCTG AGCAGCCATCACATGATCTGTTTTTCACCACTTCACTGAMGACACCATTTAT[A/C]TACCCMGGG
WI-9063 53 CCCTTGGGT CAGAMGTAGMCTTACTATTCATTAMTGTTTGACACMTTGGMTTGTC
MGGGGCATTGAGACTATAMGCAGTAGACAATCCCCACATACCATCTGTAGAGTTGGMCTGCATT CTTTTMAGTTΓΓATATGCATATATTTΓAGGGCTGCTAGACTTACTTTCCTATTTTC'FTTTCCATTGC TATTCTTGAGCACAMATGATMTCMTTATTACATTTATACATCACC I I I I I GACTTTTCCMGCCC
WI-7079 293 TTTTACAGCTCTTGGCATΠTCCTCGCCTAGGCCTGTGAGGTMCTGGGAT
GGTAAMGTT GACAGAI I I I I o CTTTTTGCTCT GACCTAGTTCC TGGATGCCGAGGTAAMGTTCTTTTFGCTCTAAAAGM[A/G]AAGGMCTAGGTCAAAMTCTGTCC 1
WI-9074 38 AAAAG TT GTGACCTATCAGTTATTM I I I I I MGGATGTTGCCACTGGCAMTGTMCTGT
GGAGTTTGCCCCTTCCTMGGGMGGAGATCTTTATCTTTCTGGTTGGCTTGACCAGTCACGTTGGGA GMGAGAGAGAGTGCCAGGAGACCCTGAGGGCAGCCGGTTCCTACTTTGGACTGAGAGMGGGAGCC CCAGGCTGGAGCAGCATGAGGCCCAGCMGMGGGCTTGGGTTCTGAGGMGCAGATGTTTCATGCT
Wl-7104b 249 GTGAGGCCTTGCACCAGGTGGGGGCCACAGCACCAGCAGCATCTTTG[CtFJF
GGAGTTTGCCCCTTCCTMGGGMGGAGATCTTTATCTTTCTGGTTGGCTTGACCAGTCACGTTGGGA GMGAGAGAGAGTGCCAGGAGACCCTGAGGGCAGCCGGTTCCTACTTTGGACTGAGAGMGGGAGCC CCAGGCTGGAGCAGCATGAGGC[C/A]CAGCMGMGGGCTTGGGTTCTGAGGMGCAGATGTTTCAT
WI-7104 1 57 GCTGTGAGGCCTTGCACCAGGTGGGGGCCACAGCACCAGCAGCATCTTTGCT
CCTGAGCCCTC TGTAGGGCTGA CATACMTGAGAGCCCTGAGCCCTCMGMCTCA[CtηGCCAGCTCAGCCCTACACCAGTTTCCACC
WI-8974 34 AAGMCTCA GCTGGC TGGAGTTCATGCMGGGCMMGGCAGTGCCATGCMGCTGTTTM
GCTTACAGGAG
CCTMGCATTG AGACTAGACA CTGTGAGGGTGACGTTAGCATTACCCCCMCCTCATTTTAGTTGCCTMGCATTGCCTGGC[Cπ TC
WI-9161 61 1 CCTGGC GGM CTGTCTAGTCTCTCCTGTMGCCAMGMATGMCATTCCA
CCCTGTTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCCTGTTCCAGAGAGGTGGGGCTGGAT
WI-9014C 93 lT Cl- GTCTCCATCTCTGTCTCMCTTTAΓF/CIGTGCACTGAGCTGCMCTTCT
CCCTGTTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTqCtFJFGTTCCAGAGAGGTGGGGCTG
Wl-9014b 44 GATGTCTCCATCTCTGTCTCMCTTTATGTGCACTGAGCTGCMCTTCT
TCTGAGAGAMTGACTTGTGGGAGACACCCTGCAGATCCTCATGGGTTTGTGACAGACCCTGCGTGCT CAGTGCCCTTTMGTGCATCCCGCTGTGCTGACTTTGAGTGGGATCMCATCTGTCCTACGGGTCCCC TCI I I I I IGGCCCCAGTATTCATGGCAGGGTTTGTTGGACACCTACTAGCTTCCCTTCCCATTCMCAC
Wl-7023b 206 A[C/A]ACACACATTCTTGCTCTACCCAMGCTCTGGCTGGCAGCACTM
TCTGAGAGAMTGACTTGTGGGAGACACCCTGCAGATCCTCATGGGTTTGTGACAG[A/C]CCCTGCGT GCTCAGTGCCCTTTMGTGCATCCCGCTGTGCTGACTTTGAGTGGGATCMCATCTGTCCTACGGGTC CCCTC I M IT IGGCCCCAGTATTCATGGCAGGGTTTGTTGGACACCTACTAGCTTCCCTTCCCATTCM
Wl-7023a 56 CACACACACACATTCTTGCTCTACCCAMGCTCTGGCTGGCAGCACTM
CTGAMTCCCCCTCTCTGCCCTGGCTGGATCCGGGGACCCCTTTGCCCTTCCCT[CT]GGCTCCCAGCC CTACAGACTTGCTGTGTGACCTCAGGCCAGTGTGCCGACCTCTCTGGGCCTCAGTTTTCCCAGCTATG AAAACAGCTATCTCACAMGTTGTGTGMGCAGMGAGAAMGCTGGAGGMGGCCGTGGGCCMT
WI-7093 54 GGGAGAGCTCTTGTTATFATTMTATTGTTGCCGCTGTTGTGTTGTTGTTA
ACATATCTGAAAAATGTTGAMGCCTMGCCAGGAATAMAGAAMGTAGAGATMTAATCA[G/A]
WI-9171 62 TTCTTTACMCCGATGGTMTTMGCTTGTATTCACMGACTTCATGC
CTAGGACCCC TCTAGAGGGTA vo 00 ATTCTCCTATT TATAGGACAGG GTGTGAGACCATCATGGTGCCAGTCTAGGACCCCATTCTCCTATTTAΓT/C]CAGTCCTGTCCTATATA
WI-9174 47 T ACTG_ CCCTCTAGAMCAGAMGCMTTTTTAGGCAGCTATGGTCAMTTGAG _ _
CAGAGGTCTTG MGGCCAGATGCACATCCCTGGMGGACATCCATGTTCCGAGMGMCAGAT[A/G]ATCCCTGTATT
CCATGTTCCGA AMTACAGGG TCMGACCTCTGTGCACTTATTTATGMCCTGCCCTGCTCCCACAGMCACAGCMTTCCTCAGGCTA
WI-7753 52 GMGMCAGA A AGCTGCCGGTTCTTAMTCCATCCTGCTMGTTMTGTTGGGTAGM
AMGGGMAG
CCACTTCTCCC TCTGACCTAGG MAGMCTACAGAGGACGATGTCCAAMCMAAMTGGCATCACCTGTCAAAMTGGAGTTCCACT
WI-9186 76 CGCA T TCTCCCCGCA[G/A]ACCTAGGTCAGACTTTCCCTTTCATCTT
AGMTATTGT
CTGCCTTAMG GGTGTGTGTGG TTGGACAMCCTAGMTTTTCTCCCTFTATGTATCTCTATCGATTGTGTAGCMTTGACAGAGMTM
WI-9193 94 G, CA TAGGGGG CTCACMTATTGTCTGCCTTAMGCA[G/A]TACCCCCCTACCACACACACCCCTGTCCTC
TTTGGATTGATATCGTGAMTCCTCAGCCGAGAMTTGGGCTGGATTG[CtF]GCTTTGGTTMTACAT
WI-9015 48 CTTTCCCTMAGMGATAMCACAAMTCCATTCCAGGTAGCTCGGCACCMCTMGM
GGAGCCAGGAGACAGCAGGGTCTGAGAGAGGAGCCAC[A/G]GTCCCTMTGACACCCACTCCTAGCC
GGTCTGAGAG GGAGTGGGTGT CTGAGGCTCGTGCCCCTCAGACTGGGGMGAGTCCMGGMGGGAGGGAGCAGCCACTCCTCMTGC
WI-7254 37 AGGAGCCAC CATFAGGGA TCMTGGCTCCCCTGMATCMGACAGG
o β
CMGAGAGAG TGCAAAGAAA CCAGGAGCACTAGAGAGGGAGGGGGMGAGCAGMGTTAGAGAAAAAMGCCACCGGAGGAMGG AGAGGAMGA GMTGAAAGTT AAAAAACATCGGCCAACCTAGAMCGTTTTCATTCGTCATTCCMGAGAGAGAGAGGAMGMAM
WI-7424 1 31 AAAA G [T/A]ACAACTTTCATTCTTTCTTTGCACGTTCATAMCATTCTACATA
TCCTGCMGMGTTCTCMGCC I I I I I GATTTTTGTGCMTMAGTACAGCTTTGCATMGAGTGAM TTGGGCTAGCTTAMTGGATCCATAMCTTTCTTCTMTTTTMGTGAGA[A/C]TCTTTTAMCACCT GTTMATTTMTGTAGCAGTCTGAGAATCTAAMTTATGTACCACTCGTTTATFTGTTCATTCATCCA
X86400 1 1 8 TCCCTTTTCCCATGMTATTTCA
GTGGCCACTACATGTTATAGAMCCATCATCTTGTCACACAGCACAGTCTATGMTMMGGCTGAG TTATCACTMGCAGGAGAAAMGCATTAAAMGTGTCCCATTMMGGGACTTTTMTCMCCTAA TMACTCTMTTCTGCTGACTTTTTAMGATCTMGGTCATTFTMTACATGCTGAAMGGGTCACA
WI-8053 242 T ATTAATTCTTTGATCTTTTTTACTCACTGTTAACTTATATAA[T/A]TTCAGMC
TACACMTGMTTGCTTTTATTTCGGTATGCATCCACATTTCAGCATTTAGTGGTCCTGMCAGCMG TGGMAGACGCAGCMTTTGCCAGGAGGTCMGCCCACCMTTTCGGGGATCTGCTGTGCACACCGG GTTCCTTCTTMTCCCTGCTGAGGATCTTG[G/A]GMGCAGCAGCAGCACCAMACCMGGCATGCA
WI-6190 ' 1 65 CCGGATTCMGGTTCTTTTTGTTCCAGTTGTCAGATTCCAMCTAGACCCCA
MCAGTCACCACCMCCACATGACMCTCGCCAGGCMGGCCTTGCTTCCCTCCCTCCTTTGCGTCCC ATGTGCCTAGTCAGCMGGTCGGGGAGGCACCGATGTTAGCTTCGCCCAMGGGAGTATTACAGAGA GAGGCTTGGGAM[G C]GGMGGMACCTGGACAGGCTTTTCAGCACTGAGAMTCACTTAMACTG
WI-6275 I 1 48 G Cl ATTTGCTTTCAGTMCTGGTATGTCTGM
ACCMGAGATCAGCTGTCTAMCAGCAGCT rGATTGT[G/ηGGGCTTCCTGAMGMACCTTGC
TGACAGCTTCTCACTGACCTGCAGGACGGMCCGTACCTGAGAGGGGATGGGGGCTCTCTCACAAM GMTATTTGGGGCAGMCCCTGGMCTGGCCACCAGGGACATCCCAMTATCCCCTCCTCCTCAGGG
WI-6421 41 CTCACCCCGACATCCTCAGCCAMTGMGGCTCTGM
GGGTGAGACGGGTTTATTGTGCACATFTACACAGCGTCACAGCGTCTGGGCTGGCAGCGGCCATGCTC CTGTGGTCGGGCTGCTCTACMGGGCGTTCACTTTTCTTCACCACACTATGTACAGTCAGTGCTCCM GGTGATGGGCTACAGTGCTGCATCAGTGAGTCTGTACACACATTTTTACATAMTTACACACGACTC
WI-6905 21 51 T j A ATACATGMAAA[T/A]AGAGCCTAAGGGCCTGTATTTTAATGAGAAMAAA
MCTTGTTTACAAMTAGGCTTTGCAMCTTCATTACTGMTTGTMAGTCMTGACTGTGTTGTTTT TAAAATATGTACCMGGAMTACAMTTGGATMTGATCATTTTTCATGCTCAGGAGAGMCAGCAC AGAAATAMGGATACTGCACMGGTGCMGGAAACCGGMCCCATTGTGTACACTGTCTTCACACAG
WI-9420 202 G A' — [G/A]GCATTCTTTCTCACCTTMCTGCAGCTGTGCMGATGCCTCAGTGTG
O I
©Ul
o 4-
O CΛ
TTTCTAGGCTGTACAGTCTGATGCATGA I I I I I I I ATAMTATTTCATACTCTTGTGMTTTGGATCTT TTTACTTTGAGCATATATTTTAGMTATGTGT[A/G]TGTTAMGGATCTCCACMTGTCTGCAGTGTG MGGCAGGTTCATTGTGGMTAGTTTMCAGTCAGGMGGCTAMCTGGTCAGTATTMTGTGTAGC
WI-7805 1 01 G CCTACCAMAATAGCCAGTAGTATCTGAAMTGMAAATAMTGMGTAT
GGCCAGGAGATTAGCMCMGGATTCATTCTGTTACTTACTTGCCCCTTTTTATCTTTCCCTCTTGCCC CAGTCCCTTCTCTCCAGCTTCATGTGMGCTCTGCACAGACMGACACTCAGTGTCCTTGGCAGTGCT [G/T]CTACTCCTCAGGTGCAGCATACATMCCAGTMGAGACTMATCTGCMTATATMAGAGCTC
WI-7416 1 37 CTACAMTCAGTAACATGMGMCACTCMMATTGGCAMTGTCATCAG
ATTTGMGATTTGGAGGGCTTTGCAGAGGAAMTAGATTTCMTTGGATCCCCAAACTATMTGACA AG I I I I I MTTAGGTGTGATCMGGCTFCTMMGTGAMTGCMGTTGTTACCAGTAMGTTTATA TCTTCCATTCAGCCCAGCTCATTTGCCAGAAMTTCAGGTGAGTGGATTGGCCAGACTATCTGGCMG
WI-140 252 GATGAAAATTTTAGTTTAMMJGTGTCATTTGTCTGTAπGGCAπ
GAGGTCTTTCAGCMCATGGMGCCCTACTGCTTCMCCCCGAGTTCCCCGGATCMGTGCTGGCACC CATGATGGAAACTCTTGCCATGGTTTTAGTACCCTGGACCMGTAGTCATTCCATCCTGACTTTAAM TTCTMACAGCCTTTGATGGGACMTCTCTGCTMAGACTMCCACTTCCTTATCTTATCTTCAGCTA
WI-198 21 8 CCTGCTTCCCTTTC[C/ηGTTTAACMAGCATAGMTATTCTGMCMCT
TTCATGGTCCCAAGACAGATTTTAMGAAAGAAMTAAGCCTCATCTCCTMCTATGACTTGGTCGG o ON MGCCMGAACCTACTTCAACATTTGACCCATMCCTTCTCTTGAGATGATGGGCTGACTTTTTCMT GCATGAGTTTGΓT/C]CCAAAGGCTTGATGGGAAMTCTCMCATTTGTTACCTMGMAGAGGATGT
WI-205C 1 46 ATCTTACTTTGTTTAAMMCTGCATATGCCTTTA I I I I I GTTTTAGTTCCC
TTCATGGTCCCMGACAGATTTTAMGMAGAAMTMGCCTCATCTCCTMCTATGACTTGGTCGG MGCCMGMCCTACTTCMCATTTGACCCATMCCTTCTCTTGAGATGATGGGCTGACTTTTTCMT GCATGAGTTTGΓT/C]CCAMGGCTTGATGGGAAMTCTCMCATTTGTTACCTMGMAGAGGATGT
Wl-205b 1 46 ATCTTACTTTGTTTAAAAMCTGCATATGCCTTTATTTTTGTTTTAGTTCCC
GMGACTGAGTTTCCAGGAGGTTGCAGCCGTTTCTCTCGGGCCATATGGCTMTMGGAGCTTGAGCA GGGATFCMCCTGTTTGCMCCCMGTNCTTTCCMGAGGTCTCAGACTACCTCCTCCATCTCCCCCT CTCCCCCACMCACACAMTACAGAGATT[G/C]MTTCAGGAGCCAGTTTCTAGGTGGGCTTTGAGC
WI-234 1 65 MTCATACACAGTMTCTCTTGGTGCTTTAGTTTTCTCAMTGGGMATGG
AGCTTTTGAMTCCAAMACCACAT[A/G]CTTGACTCTCTTATCCTCCTCTTGTTGTMCATCTATCC CTGAGGCAGAAMTACAGMCACCCTGTGGCTGCCTGMCGGAGGMGGATGGGGGCGGGGAGACAT CGGTCMTGTATCAMGCATCTCTCTGCCTGAMGACCTCTCCTGAMGACATGAGCTATTAGGAGC
Wl-276b 25 A G — TCTGGCMGGGCTTTGTCTTATCCTCCTTGCTATCCCTGATGACTGGGCAM
© 00
o vo
TGTTCTCTGGTCCAGGCACCGGGCTMGTCTTGTCTGCATMTGGMTMTCMCTGGACMCCCCNG CTNAGGTAGGNTACCTNGGCMTTAGCCCCATCTTACAGCTGCAMAGAGG[C/T]GCTCTGAGAGGT AMGTGCCCTGCCCCMCGCGCACMCTAGAGAGCAGCCAMCAGGTGTTTGMCCCAGCTCTGCCT
WI-1900 1 1 9 GACTTCAGATCTGTGTGCTTMCTGCCATGAGAMCCACTTTTCTTTGCTCC
ATTCCAGTTTCACAGTGGGCACAGGAGTCAGATTAGGGCTMGTTGGGGGGACAGGATGCACAGCGT GTTGGCTCAGGATCTCTGGGAGGTGGCACCTGTGACCTGGGCTMNCATGCTACTTTCAGAGTCMGC AGCMGCCMTGGGTAGGGAMGACCAGCC[C/T]CTCTGMNCTGGGTCCCACGTGGAGATAGTGM
WI-1943C 1 65 TACAGGGCACCGNTGAGCATTCCAGATGACTCCAMGCCCCGGCTGGAGTAT
ATTCCAGTTTCACAGTGGGCACAGGAGTCAGATTAGGGCTMGTTGGGGGGACAGGATGCACAGCGT GTTGGCTCAGGATCTCTGGGAGGTGGCACCTGTGACCTGGGCTMNCATGCTACTTTCAGAGTCMGC AGCMGCCMTGGGTAGGGAMGACCAGCC[C/T]CTCTGMNCTGGGTCCCACGTGGAGATAGTGM
Wl-1943b 1 65 TACAGGGCACCGNTGAGCATTCCAGATGACTCCAMGCCCCGGCTGGAGTAT
ATFCCAGTTTCACAGTGGGCACAGGAGTCAGATTAGGGCTMGTTGGGGGGACAGGATGCACAGCGT GTTGGCTCAGGATCTCTGGGAGGTGGCACCTGTGACCTGGGCTMNCATGCTACTTTCAGAGTCMGC AGCMGCCMTGGGTAGGGAMGACCAGC[C/T]CCTCTGMNCTGGGTCCCACGTGGAGATAGTGM
WI-1943 1 64 TACAGGGCACCGNTGAGCATTCCAGATGACTCCAMGCCCCGGCTGGAGTAT
CCAGGTGAGGCTGAMGMGGMGGAGGCMTTGCTGTTGGAGTGAGGGATTCTGGAGMGCACCCT GCAGAGCTTCATTCTGTTTTCAAMGTGTGCCATGCANGGTCNTCTGGGTTGTGAGCTCATNGCTGAG TTATCACAGCTCCTGATGACAGATCATGAAAMTAGGTACTTCCCMGCTCTGACTAGACCTTGGCA
WI-1960C 270 GTTGCMTTMATCCGTGGTGTCTGAAMCTTAAAMTGCACCTCCCMCTTT
CCAGGTGAGGCTGAMGMGGMGGAGGCMTTGCTGTTGGAGTGAGGGATTCTGGAGMGCACCCT GCAGAGCTTCATTCTGTTTTCAAMGTGTGCCATGCANGGTCNTCTGGGTTGTGAGCTCATNGCTGAG TTATCACAGCTCCTGATGACAGATCATGAAAMTAGGTACTTCCCMGCTCTGACTAGACCTTGGCA
Wl-1960b 270 GTTGCMTTAMTCCGTGGTGTCTGMMCTFAAMATGCACCTCCCMCTTT
CTGATGCCMGTGCAGCTTAGAGTNAGGMTCCAGAGAMGTNTTTGGATCTGGTMGTAGGAGTCA TTCTGGGCATTTCTTCATAGAGTNTTG I I I I lAGTCTCGTMTMTACTGTTGCCCTAGGMGGTTGTT
TTTCCTACTGCGTCTGTGAMGCCTTTCCCCATCGAGTGATACAGTACTTTCCAGTTATGGAGATTT[
WI-1977 203 /C]TMCMTCMACACTGGCTGAGGCTGTTGG
AMTTCTAGMGCCAGMGTCAGCTCACGATTTATAMGTTGMGTMATGCATTGTAGTTTCATGT TTTCTCTTAATTCTGCACAMACTAGCTAAAMTC[T/C]TTTMATCAGTTACCAGAGGCAATACCT GGGTTMTGTMGCACTCAMAGTTATGTAGAGTAGCTGTCTCTGAGTCAC I I I I I I CTACTCTCATT
WI-2012 1 02 | GGCTTCACCMTGCTTCCACTGGATC
00
SI s
-123-
TTGGTTGGCATTTTAGCCTCATAACMCTATTTACMTCATAATTGTTACTCTTATTTTACMACMG MAMTGAGGCTTMCATCACACTTCTGCTTAGTCGCAGAGCCMGATTTGMCCCAGGMTCCATT CACCGGTAC[A/G]TGCTACCTGGGTAAMAATGTTTAATTAAMTCTATGGCATTAGATTTCAMGA
WI-4584 144 GTCCTMTGTGGTTTTGMMTAGGTGTGCTTTMTTTGTFTATCAGTATGC
TTTCTGCATTTGMTGTGTATGGTCAGACTTCAGAGGMCCCAGGMTCTCATTTATTCAGTACMTA TGGTGGCCAGGTGCTCAGGCCCTATTATCAGAGAGATCTCAGTTTMCTTTCCMTTCCACCATTTAC TGACCATATGACTTGGGGMCATTATCTCACCTATCTGAGTCTGTATCC[CtηCATCTTTAMTTGTA
WI-4639 1 85 AATTTTAAGGACACCTATCATAGTAATATTGTGAGGATAAAATGAAATAA
AMTGMTCCGCTFTAGAGCAMTACCAGTMGGGCTGGTGCAGGATGGTGGTGGCTGAGAGA[A/- ]GATTACTCATAAAAGCATATTAATTTTATAAATATGGAMATTFAACTAGATAATTAAATGTGAAT TGAGTTTGMGGTTGCATGAGAGTAGGGAGGAGGTAGTTTCTACTTATAGGGTTTATATMGTNTGCT
WI-5327 63 A TCMTAGMTGGCTCTTTCGGATGACAATGATGAACTGTTCTMGCAGACAG
GCTTTTGAGMTGMMGGGGAGCCTGGACCATTGCAGGGCTTCTTCATCTCTGATTATTTTGTGTAT TTATTGTTCACTTATTTAT[C/T]GTCTGTCTCCCCTTCTGGTATGCTFGTGTCATGAMCMT GMTTC CCCAGTGCCTGGCCCGATTCGTGGCTCCTAGAGGTGTCCAGAAAAAMGTTTCGGTGMTAGMTTG
WI-5390 87 ACGMTGGGTTCAGMTTGMACCTGTGMTCTATGGMGACAMCGAAT SI
CCTTGCCTGCTTTATGCATMTGAGMTAGAGTTGACTCTCCTGTCMGMATCMTFATTMGCAGT GCAAACATTATTTTAATTT[G/A]AAAGAAACTTGTTTCTGAAACTTTGTACTCTTGTAGTNAAATTG MTCTTTCCTTCTCAGCAGTTTCCATGGTCGTGMTCCACCCCATCTCTTTTCACCAGTAGCMGATT
Wl-5404b 87 GCTACTTATATGGAAGGGTTTTAGAGTTCATMCM
CCTTGCCTGCTTTATGCATMTGAGMTAGAGTTGACTCTCCTGTCMGMATCMTTATFMGCAGT GCAAACATTATTTTAATTT[G/A]AAAGAAACTTGTTTCTGAMCTTTGTACTCTTGTAGTNAAATTG MTCTTTCCTTCTCAGCAGTTTCCATGGTCGTGMTCCACCCCATCTCTTTTCACCAGTAGCMGATT
WI-5404 I 87 GCTACTTATATGGMGGGTTTTAGAGTTCATMCM
TAGGAMGGGGATGGTGATGGCCTCTGAGACATFTAMTCTATTCTTTCACCACTCACACTGCCGCCA TATCTCCTC[A/C]CCMCACCTCTGTTTTCTGACAGCCMGTTTCCATCAGTTGATATGGGACTATTT GTTGCAAMCMTTGTTMMGATTTGGCTGACTTTGGCTGMTTTGCTACMCTCCMMAGANTC
Wl-5545b 77 GAGATACACCATGMTTTTATTTTCATTTCA
TAGGAMGGGGATGGTGATGGCCTCTGAGACATTTAMTCTATTCTTTCACCACTCACACTGCCGCCA TATCTCCTC[A/C]CCMCACCTCTGTTTTCTGACAGCCMGTTTCCATCAGTTGATATGGGACTATTT GTTGCAAMCMTTGTTMMGATTTGGCTGACTTTGGCTGMTTTGCTACMCTCCAAAMGANTC
WI-5545 77 A' C GAGATACACCATGMTFTTATTTTCATTTCA
SI CΛ
TMTTGCACAACTTACATATCAGGGTTTCTGATTGAMGGMGAGMTATTCCTTTCTTTTAGTGATT GCTTAATATTMTTCATAATMGTGCACCATCTCTΓF/CJGCTCCTTATAAATGTGTTTAGMGMGG MATTGAGTGTTGGGMTTMGCMCCAGGAGACATTTTTATATACTCCTACAGTGGGGGMGACTT
WI-6244 1 03 CCTATTTTCTΓTCCCMGGATGGATACATTTCTAC
CTGGCCTTATMTCCMGTFTAGGATTMTCTTACCCCMCTTMTAGACTFCCAGACAGTTGCAGTT GTCTACMGATTTCCTCCTAGTAGGGCTTTGGGTGTTGGCACCGTTTGGCTCATTC[C/ΗACTCTCCCT GGGTCTTATTGACTTTCAGGGAGCCTAGMGAGCTGGACMMCCTGCTTCTTTGCAGAMGAGTCG
WI-6268 1 24 GGGTTCCAMGATTTCGTTACGAI M I M A
AGGTGCCATTTMTCCATTCAMTTTGGMGCTACATCTTCMGGGTCTGAGAGAGCTCACTCCCCCC ATATATTCCCCCTTFACATGTTTFCTFATMGACATACAGTTTAATCMTTAACAMCTMACAGCTT ATATACTGGCMTATATTACAGATGGGTTTATGTCAGAGTAATAGATCACATGAMTGGACCATGTG
Wi-6336b 234 GTACCCCAGTGCATTATGTCTTGGTAGAGCC[C/T]TGAGGACACTGACAGT
AGGTGCCATTTMTCCATTCAMTTTGGMGCTACATCTTCMGGGTCTGAGAGAGCTCACTCCCCCC ATATATTCCCCCTTTACATGTTTTCTTATMGACATACAGTTTMTCMTTAACAAACTAMCAGCTT ATATACTGGCAATATATTACAGATGGGTTTATGTCAGAGTAATAGATCACATGAAATGGACCATGTG
WI-6336 234 T GTACCCCAGTGCATTATGTCTTGGTAGAGCC[CtTTFGAGGACACTGACAGT
SI
TTGGATACAAAAATTCAGTTACACAATCAGTAGCATTCMMTTAGTTATGAGTATTTATACAATTA ON CAAAAATGGNTTCATGTrTTMCM[C/A]GTATTTTAMAGCTCAAACATTTTAAAACAGGCACAAT ATTCTMNGGCATATGCATTCACCATGGGCTTTTGMTGTCCTCACTCCCMCTTCACMTCAAMTC
WI-6381 92 TACAGANGCGGCAAMGATCAGAGTTCAG
GGTTGAGGCATTGGGAMGGCAGAMTTGAGGCAGTAGAMATGGACATTTTAGGAAMGAGMGT TCAGAGGCAAAGTCATGACAGACAGGAMTACMGGCTTAGGMGACAGTAGTCTCTGTGGTTGM ATTTTGGTGTCATMTMGMGTTTAGACTTTGGTGGTTGTAGTAGTTGTAGTAGTAGGTAGCGTT[C/
WI-6436 1 98 G]ATTGGGTGTATTCCACAGACMGGTGATGTTCTMGATTTGATATTTATTGT
GAGGCCTCTTTGCTTTTCCTCAGTCMGGCTGTATCCAGGGTTGATATCTAGCCTATATGCCATATGT GTATGGCTAGTGTTTGTTCTGATTGGTTGGTGCTCACACTGCCCAGATTGTTAMTATTTTGAAMTC GTATCTGGTTCTATTCATCTGCATTCTCTGATCTTATGTCTGGCTCTATT[C/ηATCCCTATTCTCTGA
WI-6449 1 86 TCTTATGTCAGACCTGMGTTCCTCTMI I I I I CTGTGGTGTATTTATA
GAGGCCTCTTTGCTTTTCCTCAGTCMGGCTGTATCCAGGGTTGATATCTAGCCTATATGCCATATGT GTATGGCTAGTGTTTGTTCTGATTGGTTGGTGCTCACACTGCCCAGATTGTTAMTATTTTGAAMTC
GTATCTGGTTCTATTCATCTGCATTCTCTGATCTTATGTCTGGCTCTATT[CA]ATCCCTATTCTCTGA
WI-6449 1 86 C T — TCTTATGTCAGACCTGMGTTCCTCTM I I I I I CTGTGGTGTATTTATA
GCTGGAGAGAAMGACCTCCAAMGMGAMCTMATCAGAGTCTCTTGAGCMGAGGMTTGMA AGAACA[T/C]TGAAAAMATTAAAGTAGAACTCAMGAGCCMAAAGTCCCCMTTGTGTCCATTA TMGMATATTFTGAATGGAMTCTTMGAATGATTTTATTGATCAGTTAMTGTTCTTCCTCTCCTC
WI-6463 72 CAGTCCCATTTATATGACATTCCGCATGCTG
MGCAGTAMTCTTCCATCATGCCATGGATGCCAGTGGGTAMTGTTATAGAMCTTCAGAGGANAC AGAGGCAM[C/T]GTTGGTTATAGCAGTCMCGACATCATCMTGMGACATGACTTGCTTAGAGCC MGMMAGTAGGATTTTGAMGGCACAGAGAAMGGGGTGTACTAGAGGAGMCTATGTMGCAG
Wl-6474b 76 T AGGTATAGAGGMCTMAGTATAAMGAGTGAGCCATMCTTAGGGTACCATAA
MGCAGTAMTCTTCCATCATGCCATGGATGCCAGTGGGTAMTGTTATAGAMCTTCAGAGGANAC AGAGGCAM[GT]GTTGGTTATAGCAGTCMCGACATCATCMTGMGACATGACTTGCTTAGAGCC MGAAAMGTAGGATTTTGAMGGCACAGAGAAMGGGGTGTACTAGAGGAGMCTATGTMGCAG
WI-6474 76 AGGTATAGAGGMCTMAGTATAAMGAGTGAGCCATMCTTAGGGTACCATM
GMCTCMTTMCTTTGCMCACTGAGAAMTCGGATTTGGAGATCTGCAMGCTGAGGTTGAGATT TTGGACCTTGGTGATCCAMTGGGGMTGCCACGCTTCGAGGCCTGTCTATATGCTTTATTTTTGTGA CACTGTCTATTTACCCTCCCCCMTAGTGGAGMTCAGAG[T/A]GCTCCTTGTCAGTGTTGCTACAGA
Wl-6478b 175 GMGATATACAGGATGGMGGACAGCTCCTCGTAGGACCTAGACACMCTG
GMCTCAATTMCTTTGCMCACTGAGAAMTCGGATFTGGAGATCTGCAMGCTGAGGTTGAGATT s
-4 TTGGACCTTGGTGATCCAMTGGGGMTGCCACGCTTCGAGGCCTGTCTATATGCTTTATΠTFGTGA CACTGTCTATTTACCCTCCCCCMTAGTGGAGMTCAGAGΓF/A]GCTCCTTGTCAGTGTTGCTACAGA
WI-6478 1 75 GAAGATATACAGGATGGMGGACAGCTCCTCGTAGGACCTAGACACMCTG
CACATTTTGMTGCMCTGAGAMNTGGTTΓTNTAGGCCTACCTTTTATTTMGAGTACATCTGGCTC CMTGTTACCCCAMCATGCAMACATMGGCMCMTTCTGATCATTTTATAGGNTCCCMGCCCA
TTAGCAATATCTTA[G/A]TCAAATTTTAMAAGAGMCAGGAAATMGGAAGGCCTMCAGAGGAG
WI-6559 149 TTAAATMTTGTGCAAMCTTATCAGTTCTTC
TTCTTTATTGGTCCTACCMTGTGACTCTTTACCCAGGCCCACTGTTCCTATGC[G/A]CACTGGCTTTG TAGGCATTCACATCATATGTCTGTGTCCTGAAMTCTCMTTMTTTCTCCTNCCTATFCCTTTTCCATl GCTCTGCCTCATTTNCTCAGAMTTGMGGCATTTGATTATNA I I I I I I I GTTTGGGTCTGTGTAMG
Wl-6564b 54 GTTCCTTGGCAGGAGMCATGCATATGACTTTAAMTMAGACCMCA
TFCTTTATTGGTCCTACCMTGTGACTCTrTACCCAGGCCCACTGTTCCTATGC[GyA]CACTGGCTTTG
TAGGCATTCACATCATATGTCTGTGTCCTGAMATCTCMTTMTTTCTCCTNCCTATTCC' FCCATl
GCTCTGCCTCATTTNCTCAGAMTTGMGGCATTTGATTATNATTT1 GTTTGGGTCTGTGTAMG
WI-6564 54 G! GTTCCTTGGCAGGAGMCATGCATATGACTTTAAMTMAGACCMCA
SI
00
S>
VO
GCATGATTAMCCAGTGCAGAMMTACCMGTACATTGGGTGMCGATGAGCTAGCTGTTCTAGTA TTTGCTTTTTGTMTCCAGTTMGACCATCAGCATATACAACATCATCACTAACTCMCMTGTAGCT GCAGGGTMC[C/A]TGTGGATACCCTGTGTGCTCTACTNGCCTCCAMGGCATTCAGGGGATCATCA
Wl-6817 1 45 MGATGTTGGACACCTTGTGTTCAMTCTTGGTTCAGGTGCGGCCTGTGCAG
GATGGMAGCCATTTTA I I I I I CTCTMATTTTAAAATAGMGACTTTAATGGAAMCATTTAGTAC CATCATGTCACCCTGMTGCCAGCMTACCTCGACTTTTACACACGCAGGMGCCTAGTAAMGCCC CGTCAGTAGTACACATTTCTCTATGGTCCTTCMCAGTT TTTTGGCCAATTAATTAACCAAAAMMTTTTTCTGCTATTT
Wl-6819b 221 CTTTAGCAMCAGCMTMCT TTGTGTTTCCTATATGACACCTAATATCCAG
GATGGAAAGCCATTTTA I I I I I CTCTAAATTTTAAAATAGMGACTTTAATGGAAMCATTTAGTAC CATCATGTCACCCTGMTGCCAGCMTACCTCGACTTTTACACACGCAGGMGCCTAGTAMAGCCC CGTCAGTAGTACACATTTCTCTATGGTCCTTCMCAGTTTT[G/T]CATATACAMATTTTCTGCTATT
Wl-6819a 175 TTGCTTTAGCMACAGCMTMCTTTTGTGTTTCCTATATGACACCTMTAT
GCAAAMGCTTTATTGGCTCCMCMATTATCCCTTTTAAMCTCCTCTTCTTCTTCTGGTCTCAGTG GAACAACACATTTGMTTTCAGATTTGCAGTTTATAGCA I I I I I I I I CCCTAAGMCCATATAMTAC ATGCAAAACCTTGTACAT[A/G]GAGCTTAAATMTATCAAAATGCAAATATAGATTGGGTGCACTGT
Wl-6826b 1 54 TMGCTGMTTGCAMTTATGGCMCACACACTGGACTGGGGTATACGTTG
GCAAAMGCTTTATTGGCTCCMCAMTFATCCCTTTTAAAACTCCTCTTCTTCTTCTGGTCTCAGTG UI
© GAACAACACATTTGAATTTCAGATTTGCAGTTTATAGCA I I I I I I I I CCCTMGAACCATATAMTAC ATGCAAAACCTTGTACAT[A/G]GAGCTTAAATMTATCAAAATGCMATATAGATTGGGTGCACTGT
WI-6826 1 54 TMGCTGMTTGCAMTTATGGCMCACACACTGGACTGGGGTATACGTTG
AGTGCAMCTATTTTGMCAAMGTAMCTATGAGTCACAGCATTCAGCMGACATCAGACACGGA AGAGTGMCMTATTCACTMGTMAATACAGCAGATGAGATGTCTCTCACATGTA[T/C]ATTTMT TATFCATGC I I I I I CMTAGTCTCTTAGTCMCTTTCAGTGTMTTTCCACAMTATATAGCAGCTCA
Wl-6857a 1 22 MCACMATGCAGGAGCACMTGGCMAGTTTGGCMCTGTTTTGGGCTMTT
TTATAGMTACTTATGGGGCATACGNGTAMTGAACTGTCMCCTTMAATCTAMCMACAGCTTG TTTGTGGTTCGTCCTGMATCCTCCCTGCTCACAAMCAGCCAGCTACTNGGTTTTCTAAMGACGTA ATTTTGCAGGCAMCTTC[G/A]TAGAGCCATTCTGTGCAGMGMGGGMGGGAGMGCTGTTTGTT
WI-6865 1 53 G A TTACCTGTAGTATGMGATATTCTTTGCGCTGTTAGMCTGAGCTCATFM
ATTGAMACTGGTTAGCMCAGATAMTTACMTAGAGCCTGGATATAMMTGAGAGMGMTGC AGACTTA[C/T]MGCTTATAGAGAMGTCAAAMGGAGCMGTTTTTGMATCAGATTTTATGATAC GGAAAAAAMTTFCCTTFTTTTGCCMCAGGATTATTTCGMTAATAMTCTGCCAGTGCCMTCAG
WI-6909 73 C T AMCACCATTTCCACMTATTTGCATGCCCCTAGTTGCCTATTTTATACATATC
UI
ACTTCTAGTGCCTCTGTTACCACCACCTCTMTGCCTCTGGTCGCCGCACTTCTGATGTCCGTAGGCCT TMATCTGCCTGGCGTCCCCTCCCTCTGTCTTCAGCACCCAGAGGAGGAGAGAGCCGGCAGTTCCCTG CAGGAGAGAGGAGGGGCTGCTGGACCCMGGCTCAGTCCCTCTGCTCTCAGGACCCCCTGTCCTGACT
Wl-6996b 242 CTCTCCTGATGGTGGGCCCTCTGTGCTCTTCTCTTCqG/ηGTCGGATC
ACTTCTAGTGCCTCTGTTACCACCACCTCTMTGCCTCTGGTCGCCGCACTTCTGATGTCCGTAGGCCT TAMTCTGCCTGGCGTCCCCTCCCTCTGTCTTCAGCACCCAGAGGAGGAGAGAGCCGGCAGTTCCCTG CAGGAGAGAGGAGGGGCTGCTGGACCCMGGCTCAGTCCCTCTGCTCTCAGGACCCCCTGTCCTGACT
WI-6996 228 CTCTCCTGATGGTGGGCCCTCTG[T/G]GCTCTTCTCTTCCGGTCGGATC
TGGGGAGGACAGGGAGATGCTGCAGTTCCAAMGAGMGGTFTCTTCCAGAGTCATCTACCTGAGTC CTGMGCTCCCTGTCCTGAMGCCACAGACMTATGGTCCCAMT[G/A]CCCGACTGCACCTTCTGTG CTTCAGCTCTTCTFGACATCMGGCTCTTCCGTTCCACATCCACACAGCCMTCCMTTMTCMACC
Wl-7021 b 1 1 2 ACTGTTATTMCAGATAATAGCAACTTGGGAMTGCTTATGTTACAGGTTA
TGGGGAGGACAGGGAGATGCTGCAGTTCCAAMGAGMGGTTTCTTCCAGAGTCATCTACCTGAGTC CTGMGCTCCCTGTCCTGMAGCCACAGACMTATGGTCCC[A/G]MTGCCCGACTGCACCTTCTGTG CTTCAGCTCTTCTTGACATCMGGCTCTTCCGTTCCACATCCACACAGCCMTCCMTTMTCMACC
WI-7021 1 08 ACTGTTATTMCAGATAATAGCAACTFGGGAMTGCTTATGTTACAGGTTA
UI
GGCAGTAGGACCACCAGTGTGGGGTTCTGCTGGGACCTTGGAGAGCCTGCATCCCAGGATGCGGGTGG SI CCCTGCAGCCTCCTCCACCTCACCTCCATGACAGCGCTAMCGTTGGTGAfC/ηGGTTGGGAGCCTCT GGGGCTGTTGMGTCACCTTGTGTGTTCCMGTTTCCMACMCAGAMGTCATTCCTTCTTTTTAM
WI-7056C 1 1 8 ATGGTGCTTMGTTCCAGCAGATGCCACATMGGGGTTTGCCATTTGATA
GGCAGTAGGACCACCAGTGTGGGGTTCTGCTGGGACCTTGGAGAGCCTGCATCCCAGGATGCGGGTGG CCCTGCAGCCTCCTCCACCTCACCTCCATGACAGCGCTAMCGTTGGTGA[C/ηGGTTGGGAGCCTCT GGGGCTGTTGMGTCACCTTGTGTGTTCCMGπTCCMACMCAGAMGTCATTCCTTCTTTTTAM
Wl-7056b 1 1 8| ATGGTGCTTMGTTCCAGCAGATGCCACATMGGGGTTTGCCATTTGATA
MTTCGCTGMAAAGGMCTACCTATCCTTACATTTCACCTACTMTGTCTCTTCTMCATCTTAGAG GTCCATGGAGMGGCATATGGAGMCATGTTTTATACTGCTCTATAMTAGTATTCCMTCACTGTG CTTAATTTAAATAGCATT[A/C]TCTTATCATTFATCAGCCTTTTATGTATTTTCCAAGTAAAATATTA
Wl-7091 b 1 53 ACATATTATTTCATFGGTCTTC I I M I I ATCTGGTTCTATATGMTGCTAT
MTTCGCTGMMAGGMCTACCTATCCTTACATTTCACCTACTMTGTCTCTTCTMCATCTTAGAG GTCCATGGAGMGGCATATGGAGMCATGTTTTATACTGCTCTATAMTAGTATTCCMTCACTGTG CTTAATTTAAATAGCAT [A/C]TCTTATCATTTATCAGCCTTTTATGTATTTTCCMGTAAAATATTA
WI-7091 1 531 ACATATTATTTCATFGGTCTTC I I I I I I ATCTGGTTCTATATGMTGCTAT
TGTGMGCCACATTTTCCMCATGAGCCTCATGMGCCAACTMGTGTTATTGMCTGΓΓ/CIMTTC TCTCMTMCTCAGTGTAGCACTTTAMGTCTGMGGACAGCMCATGAAMGAGCATATCMTGTG
GTGGAGAMGGGMGGGGTTGGC I I I I I MTTTAT TTTTCTTCATCTTTTATMCMGMAGNNNNN
WI-7136 58 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTAGCTTTCTATATATG
GGGACGCCTGTTGTT1TGGCTCAATTTGGGTTTGTTGGTCACATGGAGCTCTTCCATTTCGTTTAGCTG MTMTGAGTTGTTCCTAGAGGAGACAGCCTGTCTCTCCTFGTTGCCCCCAMGCCCATGCCCTGCCG TGGTGGCAGCTGGGGCTGTGGATGGGAGGGGTCCCCMCATGGATGTGTTGCCCCTCCTCCGCATGCC
WI-7146C 21 0 MCGC[A/G]GTTCATGTACMGGCCCCTCTGCMCTGGAGAGAAMTTA
GGGACGCCTGTTGTTTTGGCTCMTTTGGGTTTGTTGGTCACATGGAGCTCTTCCATTTCGTTTAGCTG MTMTGAGTTGTTCCTAGAGGAGACAGCCTGTCTCTCCTTGTTGCCCCCAMGCCCATGCCCTGCCG TGGTGGCAGCTGGGGCTGTGGATGGGAGGGGTCCCCMCATGGATGTGTTGCCCCTCCTCCGCATGCC
Wl-7146b 21 0 MCGC[A G]GTTCATGTACMGGCCCCTCTGCMCTGGAGAGAAMTTA
GGGACGCCTGTTGTTTTGGCTCMTTTGGGTTTGTTGGTCACATGGAGCTCTTCCATTTCGTTTAGCTG MTMTGAGTTGTTCCTAGAGGAGACAGCCTGTCTCTCCTTGTTGCCCCCAMGCCCATGCCCTGCCG TGGTGGCAGCTGGGGCTGTGGATGGGAGGGGTCCCCMCATGGATGTGTTGCCCCTCCTCCGCAηGA
WI-7146 202 ICCMCGCAGTTCATGTACMGGCCCCTCTGCMCTGGAGAGAAMTTA
UI
ATATTACMCTTGC I I I I I AGCTGATCTTCCATCCTCMATGACTC I I I I I I CTTTATATGTTMCATA I TATAAMTGGCMCTGATAGTCMTTTTGAI I I I lATTCAGGMCTATCTGAMTCTGCTCAGAGCCT ATGTGCATAGATGAAACNNNNNNNtA/T]AAAAAAAGTTATTTAACAGTAATCTATTTACTAATTAT
WI-7153 1 61 AGTACCTATCTTTAMGTATAGTACATTTTACATATGTAAATGGTATGTTT
TAGMTAGATGCGGTCATATTCTTCTTTGGCTTCTGGTTCTTCCAGCCCTCATGGTTGGCATCACATAT GCCTGCATGCCATTMCACCAGCTGGCCCTACCCCTATMTGATCCTGTGTCCTAMTTMTATACAC CAGTGGTTCCTCCTCCCTGΓT/G]TAAAGACTMTGCTCAGATGCTGTTTACGGATATTTATATTCTAG
WI-7155 1 56 TCTCACTCTCTTGTCCCACCCTTCTTCTCTTCCCCATTCCCMCTCCAG
AGCTCCACCAGATGCAGATTTGTGTTΓTGTTTTCTTGTTATCACTGTCACACAGCTTATMCATGTAT
GCTTTTCAGMTACAGTTGTCTAGCCMGCCATCMGTGTCTGMATTCMTATTGGTTTATGCAMT
ACAGCAAACTTTTATTTAAGTAGAT[A/G]GGAGAATATGTTTAAAATATTAGGAATCCTAGACCATA
Wl-7169b 1 61 TTTCMGTCATCTTAGCAGCTAGGATTCTCAMTGGMGTGTTATATATA
CTCCTAGACTAGTGCTTTACCTTTATTMTGMCTGTGACAGGMGCCCMGGCAGTGTTCCTCACCA ATMCTTCAGAGMGTCAGTTGGAGAAMTGMGMMAGGCTGGCTGAAMTCACTATMCCATC AGTTACTGGTTTCAGTTGACAAMTATATMTGGTTTACTGCTGTCATTGTCCATGCCTA[C/T]AGAT
Wl-7175b 1 94 MTTTATTTTGTA I I I I I GMTMAMACATTTGTACATTCCTGATACTGGG
ui
4-.
UI CΛ
I -4
UI 00
U
VO
©
TGAMTCCTGGGTCTCTTGGCCTGTCCTGTAGCTGGTTTATTTTTTACTTTGCCCCCTCCCCAC I I I I I I TGAGATCCATCCTTTATCAAGAAG[T/A]CTGAAGCGACTATAMGGTTTTTGMTTCAGATTTAAM ACCMCTTATAMGCATΓGCMCMGGTTACCTCTATTTTGCCACMGCGTCTCGGGATTGTGTTTGA
WI-7388 94 CTTGTGTCTGTCCMGMCTTTTCCCCCAAAGATGTGTATAGTTATTGG
TTAGATTTTMTFGGCMCCAGCMCTCACTGCCACCATTCCACTGCAGATCTNCTATTCCTGG[A/G] GTTGATATGACMGGMACCCTATTGGMCCMGTCTTCAGATTGTNCCATGTGCAGACAGGCTCCT TGTCTGTAGGTGTAGTAGCATGTACACTGTACTGTTCACTGTMCATAGTTTGTNCTGGTATTTGTTA
WI-7438 64 TTGGAMTGMTATCGCTTCCACTGACTTTTACCA
CCATGATCCCCTCCTCTTGCCAMTGGAGGMGCCTGTGGATGGTACCMCAMCMGCCCCAMCC CAGTACAMCTGAGMTGAGAGMCCCTGATAGCACTGTCTGMTTGCCAGGAGCCTCCMGGCTM TCCTACCCCTGGATTTCT|T/C]TGTTGTTTAAGTTATTTCTAGCCACCACAMGAGGGTACTGCCCM
Wl-7454b 1 52 CAGACTCATCCTTAAAMATCCCATTTGTCTACTTCTCAMTG I I I I I GACA
CCATGATCCCCTCCTCTTGCCAMTGGAGGMGCCTGTGGATGGTACCMCAMCMGCCCCAMCC CAGTACAMCTGAGMTGAGAGMCCCTGATAGCACTGTCTGMTFGCCAGGAGCCTCCMGGCTM TCCTACCCCTGGATTTCT[T/C]TGTTGTTTMG"FTATTTCTAGCCACCACAMGAGGGTACTGCCCM
WI-7454 1 52 CAGACTCATCCTTAAAAAATCCCATTTGTCTACTTCTCAMTG I I I I I GACA
AATTTGAAAATCTGAAAAAAAGTGCATAAGCAGAGAMTGACACTTATTCCAAATAAATAAATTGT 4i. SI CCA I I I I I CACTCAGTCCATCTTAACCATGTACAATGCACTAMTTACTATTFATMTTTCCTATGTA CMCAGAGCCACAGCACMGAGGGTGGGCATMGCAGTTGCCA[G/C]CCAGMGAGCTTTCACTCAT
Wl-7464c 1 77 GAMGMAGCCCTACAMTAGGCCCAGGAGMGCMCGTTCACCMCMTTAT
AATTTGAAMTCTGAMAAMGTGCATAAGCAGAGAMTGACACTTATTCCAAATAMTMATTGT CCA I I I I I CACTCAGTCCATCTTMCCATGTACAATGCACTAMTTACTATTTATMTTTCCTATGTA CMCAGAGCCACAGCACMGAGGGTGGGCATMG[C/A]AGTTGCCAGCCAGMGAGCTTTCACTCAT
Wl-7464b 1 68 GAMGMAGCCCTACAMTAGGCCCAGGAGMGCMCGTTCACCMCMTTAT
MTTTGMMTCTGAMMMGTGCATMGCAGAGAMTGACACTTATTCCAMTMATAMTTGT CCA I I I I I CACTCAGTCCATCTTMCCATGTACAATG[C/A]ACTAAATTACTATTTATMTTTCCTAT GTACMCAGAGCCACAGCACMGAGGGTGGGCATMGCAGTTGCCAGCCAGMGAGCTTTCACTCAT
Wl-7464a 1 03 GAMGMAGCCCTACAMTAGGCCCAGGAGMGCAACGTTCACCMCMTTAT
CMTTCTCMTCCMCCTAGTCTGTNTGCCTAMCCATTCCAGACAMCTTCCACTTCGMGGTTTTA MTGCATMGTCAGATAGCMTCCTTCAGTTGCCCCAGAGGCACATCACGTTCTTTGMTGCTTCA[T
/G]TATAGTCCTCTTCATTTAGCMTCAGTGAGGCMTACACTGGCATCATGATCCCI I I I I I IAGGA
Wl-7499b 1 34 G! ACTCTGTACAAMTTCCCTTTGMMTATAMTTTTGGAAATGAGTGATGA
CMTTCTCAATCCMCCTAGTCTGTNTGCCTM[A/G]CCATTCCAGACAMCTTCCACTTCGAAGGTT TTAMTGCATMGTCAGATAGCAATCCTTCAGTTGCCCCAGAGGCACATCACGTTCTTFGMTGCTTC ATTATAGTCCTCTTCATTTAGCMTCAGTGAGGCMTACACTGGCATCATGATCCCTTTTTTTAGGM
Wl-7499a 33 CTCTGTACAAMTTCCCTTTGAMATATAMTTTTGGMATGAGTGATGA
TGGGMTAGTMGAGAMGATGGGAAAGGTGACCAMAACMTATAGAGGCAGAGGCCMGTGMT GCATCCCAGCAGCAGACCACTTNAAMGTAGTCCTGGTGCTGATFGCCTAGC[A/C]GGAGAGTTGAG TGCCACAGGTAAGAATGAGTGMGAGGAAAAMTCATGATGTCATGTATGCAGTMTTACTATGTCA
Wl-7506b 1 1 8 GMGMMTATTTTAAMTATTGGACCACTCTTGTTCTACCATCCCTACCCACT
TGGGMTAGTMGAGAMGATGGGAMGGTGACCAMMCMTATAGAGGCAGAGGCCMGTGMT GCATCCCAGCAGCAGACCACTTNAAMGTAGTCCTGGTGCTGATTGCCTAGC[A/C]GGAGAGTTGAG TGCCACAGGTMGMTGAGTGMGAGGAAMMTCATGATGTCATGTATGCAGTMTTACTATGTCA
WI-7506 1 1 8 GMGAAMTATTTTAAMTATTGGACCACTCTTGTTCTACCATCCCTACCCACT
TGTGMTTCTTAGCTCTGGMGGTGTTTATGCCTTTGCGGGTTTCTTGATGTGTTCGCAGTGTCACCCA AGAGTCAGMCTGTACACATCCCAAMTTTGGTGGCCGTGGMCACATTCCCGGTGATAGMTTGCT AMTFGT[CAF]GTGAAATAGGTTAGM I I I I I CTTTAAATTATGGTTTTCTTATTCGTGAAAATTCGG
Wl-7534b 1 43 AGAGTGCTGCTAAMπGGATTGGTGTGATCTTTTTGGTAGTTGTMTTT
TGTGMTTCTTAGCTCTGGMGGTGTTTATGCCTTTGCGGGTTTCTTGATGTGTTCGCAGTGTCACCCA AGAGTCAGMCTGTACACATCCCAAMTTTGGTGGCCGTGGMCACATFCCCGGTGATAGMTTGCΓΓ /C]MATTGTCGTGAAATAGGTTAGM I I I I I CTTTAMTTATGGTTTTCTTATTCGTGAAAATTCGG
WI-7534 1 35 AGAGTGCTGCTAAMTTGGATTGGTGTGATCTTTTTGGTAGTTGTMTTT
GGGMAGMTAAMTFAGCTTGAGCMCCTGGCTMGATAGAGGGGCTCTGGGAGACTTTGMGACC AGTCCTGTTTGCAGGGMGCCCCACTTGMGGMGMGTCTMGAGTGMGTAGGTGTGACTTGMC TAGATTGCATGCTTCCTCCTTTGCTCTT[G/A]GGMGACCAGCTTTGCAGTGACAGCTTGAGTGGGTT
Wl-7543b 1 62 CTCTGCAGCCCTCAGATFATTTTTCCTCTGGCTCCTTGGATGTAGTCAGTTA
GGGAMGMTMMTTAGCTTGAGCMCCTGGCTMGATAGAGGGGCTCTGGGAGACTTTGMGACC AGTCCTGTTTGCAGGGMGCCCCACTTGMGGMGMGTCTMGAGTGMGTAGGTGTGACTTGMC TAGATTGCATGCTTCCTCCTTTGCTCTr[G/A]GGMGACCAGCTrTGCAGTGACAGCTTGAGTGGGTT
WI-7543 M 62 I GJ A CTCTGCAGCCCTCAGATTATTTTTCCTCTGGCTCCTTGGATGTAGTCAGTTA
GGTGATCMGATCTGTTCCACAGGGCTMTGCCACCATCTCCCCTCAMATTTGTAGAGGtT/C]TCTA MMGAMGTGGTATGTTGTGTGATGATCAGCACTMGTCCTGCATTCCTGTTAMGCCACTTGGGTC ATMGMGGGMGTAAAAMTGAAGTCTGACTAGAMTTCTATTGCAGAGGCCMGTACATTTAGT
WI-7555C 60 T Ci ATGGCATTGAGTTGTGATATAGTTTTCATTTGATGTGCATTTTGMTTTCAG
4i.
CΛ
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCAAMCCCMCATMGTGTΓTGCTTTCCTTTM AMTATGCATCAMTCGTCTCTCATTACTTTTCTCTGAGGGTΠTAGTA[A/G]ACAGTAGGAGTTMT AMGMGTFCATΠTGGTTTACACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCTAT
W1-7577J 1 1 7 TGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCMMCCCAACATMGTGTTTGCTTTCCTTTM AMTATGCA[T/C]CAAATCGTCTCTCATTACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTMT MAGAAGTTCATTTTGGTTTACACGTAGGAMGAAGAGMGCATCAMGTGGAGATATGTTMCTAT
WI-7577J 77 TGTATAATGTGGCCTGTTATACATGACACTCTTCTGAATTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCMMCCCMCATM[G/C]TGTTTGCTTTCCTT TMMATATGCATCAMTCGTCTCTCATTACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTMT AMGMGTTCATTTTGGTTTACACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCTAT
Wl-7577 50 TGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCAAMCCCMCATAAGTGTTTGCTTTCCTTTM MATATGCATCAAATCGTCTCTCATTACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTMTM AGMGTTCATTTTGGTTTACAC[G/A]TAGGAMGMGAGMGCATCMAGTGGAGATATGTTMCT
Wl-7577g 1 57 ATTGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAAATAATCAAAACCCAACAT[A/G]AGTGTTTGCTTTCCTT
ON
TMAAATATGCATCAMTCGTCTCTCATTACTTTTCTCTGAGGGTTT FAGTAMCAGTAGGAGTTMT
AMGMGTTCAT GGTTTACACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCTAT
Wl-7577f 48 G TGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
AACCATGTTCCCTTCTTCTTAGCACCACAMTMTCMAACCCMCATMGTGTTTGCTTTCCTTTM MATATGCATCAAATC[G/A]TCTCTCATTACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTMT MAGMGTTCATTTTGGTTTACACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCTAT
Wl-7577e 84 TGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCMMCCCMCATMGTGTTTGCTTTCCTTTM MATATGCATCMATCGTCTCTCAT[T/C]ACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTAAT AMGMGTTCATTTTGGTTTACACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCTAT
Wl-7577d 93 TGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
MCCATGTTCCCTTCTTCTTAGCACCACAMTMTCMMCCCMCATMGTGTTTGCTTTCCTTTM MATATGCATCAMTCGTCTCTCATTACTTTTCTCTGAGGGTTTTAGTAMCAGTAGGAGTTMTM AGMGTTCATTTTGGTTTA[C/A]ACGTAGGAMGMGAGMGCATCAMGTGGAGATATGTTMCT
WI-7577C I 1 54 ' C' A ATTGTATMTGTGGCCTGTTATACATGACACTCTTCTGMTTGACTGTATTTC
-4
4λ
00
÷-
VO
TTAMTGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGACCCCAGGAGTCCCTGGTMTMGTACTGTG TACAGMTTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGAGAG
Wl-7743d 275 T GGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCCMCG
TTAMTGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGAC[C/A]CCAGGAGTCCCTGGTMTMGTACT GTGTACAGMTTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGA
WI-7743Θ 1 06 GAGGGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCC
TTAMTGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGACCCCAGGAGTCCCTGGTMTMGTACTGTG TACAGAATFCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGAGAG
Wl-7743d 275 | T GGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCCMCG
TTMATGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGAC[C/A]CCAGGAGTCCCTGGTMTMGTACT GTGTACAGMTTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGA
WI-7743C 1 06 GAGGGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCC
TTAMTGAGTGTGTFTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGACCCCAGGAGTCCCTGGTMTMGTACTGTG TACAGMTTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGAGAG
Wl-7743b 275 GGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCCMCG
TTAMTGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGAC[C/A]CCAGGAGTCCCTGGTMTMGTACT GTGTACAGMTTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGA
Wl-7743 1 06 GAGGGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCC
TTAMTGAGTGTGTTTGTCACCGTTGGGGATTGGGGMGACTGTGGCTGCTGGCACTTGGAGCCMGG GTTCAGAGACTCAGGGCCCCAGCACTAMGCAGTGGACCCCAGGAGTCCCTGGTMTMGTACTGTG TACAGA TTCTGCTACCTCACTGGGGTCCTGGGGCCTCGGAGCCTCATCCGAGGCAGGGTCAGGAGAG
WI-7743 275 GGGCAGMCAGCCGCTCCTGTCTGCCAGCCAGCAGCCAGCTCTCAGCCMCG
TGACATTTATTCAMGTTMMGCAMCACTTACAGMTTATGAAGAGGTATCTGTTTAACATTTCC TCAGTCMGTTCAGAGTCTTCAGAGACTTCGTMTTMAGGMCAGAGTGAGAGACATCATCMGTG GAGAGAAATCtA/G]TAGTTTAAACTGCATTATAAATTTTATAACAGAATTAAAGTAGATTTTAAAA
WI-7758 1 441 GATMMTGTGTMTTTTGTTTATATTTTCCCATTTGGACTGTMCTGACTGCC
ACAGGGCCTTTGGCAGGTGCAGCCCCCACTGCCTTTGACCTGCCTCCCTTCATGCATGGAMTTCCCT TCATCTGGMCCATCAGAMCACCCTCACACTGGGACTTGCAAAMGGGTCAGTATGG[G/C]TTAGG GMMCATTCCATCCTFGAGTCAMAMTCTCMTTCTTCCCTATCTTTGCCACCCTCATGCTGTGTG
Wl-7765b 1 26 ACTCAMCCAMTCACTGMCTTTGCTGAGCCTGTAMATAAMGGTCGGA
TTMTTTACTGATTCCAGCMGACCAMTCATTGTATCAGATTΛ I I I I I MGTTTTATCCGTAGTTT GATAAMGATTTTCCTATTCCTTGGTTCTGTCAGAGMCCTMTMGTGCTACTTTGCCATTMGGCA
GACTAGGGTTCATGTC I I I I LACCCTTTNNNNNNNNNTTGTAAMGTCTAGTTACCTACTTTTTCTTT
Wl-7773b 237 G GATTTFCGACGTTTGACTAGCCATCTCMGCM[C/G]TTTCGACGTTTGA
TGCMCCTCTTTTCGTGATGGGCAGCCTGCTGGTCAGCACTCCAGTAGCGAGAGACGGCACCCAGMT CAGATCCCAGCTTCGGCATTTGATCAGACCAMCAGTGCTGTTΓCCCGGGGAGGAMCACTTTTTTM TTACCCTTTTGCAGGCACCACCTTTAATCTGTTTTT/C]ATACCTTGCTTATTAMTGAGCGACTTMA
Wl-7774b 1 70 ATGATTGAAMTMTGCTGTCCTTTAGTAGCMGTAAMTGTGTCTTGCT
GCAGAGACCTTCCMGGACATATTGCAGGATTCTGTMTAGTGMCATATGGAMGTATTAGAMTA TTTATTGTCTGTAAATACTGTAMTGCATTGGMTAAMCTGTCTCCCCCATTGCTCTATGAMCTGC ACATTGGTCATTGTGMTANNNNNNNNNNNGCCMGGCTMTCCMTTATTATTATCACATTTACCA
WI-7785C 1 65 TMTTTA'ΓΓTTGTCCATTGATGTATTTATTTTGTAAATGTATCTTGGTGCTGC
GCAGAGACCTTCCMGGACATATTGCAGGATTCTGTMTAGTGMCATATGGAMGTATTAGAMTA CΛ I TTTATTGTCTGTAAATACTGTAMTGCATTGGMTAAMCTGTCTCCCCCATTGCTCTATGAMCTGC ACATTGGTCATTGTGMTANNNNNNNNNNNGCCAAGGCTMTCCAATTATTATTATCACATTTACCA
Wl-7785b 1 65 TAATTIATTTTGTCCATTGATGTATTTATTTTGTAMTGTATCTTGGTGCTGC
GCAGAGACCTTCCMGGACATATTGCAGGATTCTGTMTAGTGMCATATGGAMGTATTAGAMTA TTTATTGTCTGTAMTACTGTAMTGCATTGGMTMMCTGTCTCCCCCATTGCTCTATGAMCTGC ACATTGGTCATTGTGMTANN[-
/T]NNNNNNNNGCCAAGGCTMTCCMTTA'TTATTATCACATTTACCATMTTTATTTTGTCCATTGA
WI-7785 1 56 TGTATTTAJTTTGTAMTGTATCTTGGTG __
TCTCCCCCTCATCCMCTCCGAMGTCTGMTCFCCCMGGAGGGCACCATCTTACAGAGACTCTCCC TGACGGTGGMTTTM[G/A]TTTAGGGTCCCTAAMGCATTTGACACACAGTTGTTGMTGACTGAC CCAAMTGTGMTGMGCTMTGTGMTGTGAGTGMGCTCCCTTCAGGCCCGCTGCCCTAGGATAT
WI-7789C 84 GCCCTCCTGGTGACTCGGGGGCTGTCTCAGACGACTAGCCCAGGACCCATCT _
TCTCCCCCTCATCCMCTCCGMAGTCTGMTCTCCCMGGAGGGCACCATCTTACAGAGACTCTCCC TGACGGTGGMTTTM[G/A]TTTAGGGTCCCTAAMGCATTTGACACACAGTTGTTGMTGACTGAC CCAAMTGTGMTGMGCTMTGTGMTGTGAGTGMGCTCCCTTCAGGCCCGCTGCCCTAGGATAT
Wl-7789b 84 ' G Ai — GCCCTCCTGGTGACTCGGGGGCTGTCTCAGACGACTAGCCCAGGACCCATCT
CΛ
UI
GCAGGAAATAGTCACTCATCCCACTCCACATMGGGGTTTAGTMGAGMGTCTGTCTGTCTGATGA TGGATAGGGGGCAMTC I I I I I CCCCTTTCTGTTMTAGTCATCACATTTCTATGCCAMCAGGMCG
ATCCATMCTTTAGT[CT]TTAATGTACACATTGCATTTTGATAMATTMTTTTGTTGTTTCCTTTG
Wl-7830d 1 50 T AGGTTGATCGTTGTGTTGTTRTGCTGCACTTTTTACI I I I I IGCGTGTGGA
GCAGGAMTAGTCACTCATCCCACTCCACATMGGGGTTTAGTMGAGMGTCT[G A]TCTGTCTGA TGATGGATAGGGGGCAMTCI I I I I CCCCTTTCTGTTMTAGTCATCACATTTCTATGCCAMCAGGA ACGATCCATMCTTFAGTCTTAATGTACACATTGCATTFTGATAAMTΓMTTTTGTTGTTTCCTTTG
WI-7830C 54 AGGTTGATCGTTGTGTTGTTTFGCTGCACTTTTTAC I I I I I I GCGTGTGGA
GCAGGAMTAGTCACTCATCCCACTCCACATMGGGGTTTAGTMGAGMGTCTGTCTGTCTGATGA TGGATAGGGGGCAMTC I I I I I CCCCTTTCTGTTMTAGTCATCACATTTCTATGCCAMCAGGMC[ G/A]ATCCATMCTTTAGTCTΓAATGTACACATTGCATTFTGATAMATTMTTTTGTTGTTTCCTTTG
Wl-7830b 1 34 AGGTTGATCGTTGTGTTGTTFTGCTGCACTTTTTAC I I I I I I GCGTGTGGA
GCAGGAAATAGTCACTCATCCCACTCCACATMGGGGTTTAGTA[A/G]GAGMGTCTGTCTGTCTGA
TGATGGATAGGGGGCAMTCI I I FT CCCCTTTCTGTTMTAGTCATCACATTTCTATGCCAMCAGGA ACGATCCATMCTTTAGTCTTAATGTACACATTGCATTTTGATAAMTTMTTTΓGTTGTTFCCTΓTG
Wl-7830 44 AGGTTGATCGTTGTGTTGTTTTGCTGCACTTΓTTACTTTTTTGCGTGTGGA
CCACTTCCTATCTGA I I I I I CCCAGFC/ΗAAATGAGGCAGGCAATTCTAGTCTTCCACAAAACATCTA 4i. GCCATCTAAMTGGAGAGATGAATCATTCTACCTATACAMCMGCTAGCTATTAGAGGGTGGTTGG GGTATGCTACTCATMGATTTCAGGGTGTCTTCCMCTGMATCTCMTGTTCTCAGTACGAAAMC
Wl-7865e 25 CTGAAATCACATGCCTATGTAAGGAMGTGCTATTCACCCAGTAMCCCAM
CCACTTCCTATCTGATTTTFCCCAGCAMTGAGGCAGGCMTTCTAGTCTTCCACAAMCATCTAGCC ATCTAAMTGGAGAGATGAATCATTCTACCTATACAMCMGCTAGCTATFAGAGGGTGGTTGGGGT ATGCTACTCATAAGATTTCAGGGTGTCTTCCMCTGAAATCTCMTGTTCTCAGTA[C/ηGAAMAC
Wl-7865d 1 91 CTGAMTCACATGCCTATGTMGGMAGTGCTATTCACCCAGTAMCCCMA
CCACTTCCTATCTGATTTrTCCCAG[CtηAAATGAGGCAGGCMTTCTAGTCTTCCACAAAACATCTA GCCATCTAAMTGGAGAGATGMTCATTCTACCTATACAMCMGCTAGCTATTAGAGGGTGGTTGG GGTATGCTACTCATMGATTTCAGGGTGTCTTCCMCTGMATCTCMTGTTCTCAGTACGAAAMC
WI-7865C 25 l C CTGAMTCACATGCCTATGTMGGMAGTGCTATTCACCCAGTAMCCCMA
CCACTTCCTATCTGATTTTTCCCAGCAMTGAGGCAGGCMTTCTAGTCTTCCACAAMCATCTAGCC ATCTAAMTGGAGAGATGMTCATTCTACCTATACAMCMGCTAGCTATTAGAGGGTGGTTGGGGT ATGCTACTCATAAGATTTCAGGGTGTCTTCCMCTGMATCTCMTGTTCTCAGTA[C/T]GAAMAC
Wl-7865b 1 91 CT CTGAMTCACATGCCTATGTMGGMAGTGCTATTCACCCAGTAMCCCAM
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATGCCCTGCCATTGAMCAGTGATTMGTTTGATCMGCCATGGTGA[C/T]ACA AAMTGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATTCTCTCAGATTTGMCCAGTGAM
Wl-7900d 1 28 T TATGATGTATTTCTGAGCTAAAACTCAACTAΓAGAAGACATTAAAAGAAATC
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATG[CTΗCCTGCCATTGAMCAGTGATTMGTTTGATCMGCCATGGTGACACA AAAATGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATFCTCTCAGATTTGMCCAGTGAM
Wl-7900e 84 TATGATGTATTFCTGAGCTAAMCTCAACTATAGMGACATTAAMGAAATC
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATGCCCTGCCATTGAMCAGTGATTMGTTTGATCMGCCATGGTGA[C/ΗACA AAMTGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATTCTCTCAGATTTGMCCAGTGAM
Wl-7900d 1 28 TATGATGTATTTCTGAGCTAAMCTCMCTATAGMGACATTAAAAGAAATC
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATG[CT CCTGCCATTGAMCAGTGATTMGTTTGATCAAGCCATGGTGACACA AAMTGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATFCTCTCAGATTTGMCCAGTGAM
WI-7900C 84 TATGATGTATTTCTGAGCTAAMCTCMCTATAGMGACATTAMAGAAATC
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATGCCCTGCCATTGAMCAGTGATTMGTTTGATCMGCCATGGTGA[C/T]ACA AAAATGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATTCTCTCAGATTTGMCCAGTGAM
Wl-7900b 1 28 T TATGATGTATTTCTGAGCTAAAACTCAACTATAGAAGACATTMAAGAAATC
GCTCACTGTGACCCATCCTTACTCTACTTGGCCAGGCCACAGTAAMCMGTGACCTTCAGAGCAGCT GCCACMCTGGCCATG[CTΗCCTGCCATTGAMCAGTGATTMGTTTGATCMGCCATGGTGACACA AAMTGCATTGATCATGMTAGGAGCCCATGCTAGMGTACATTCTCTCAGATTTGMCCAGTGAM
WI-7900 84 TATGATGTATTTCTGAGCTAAAACTCAACTATAGAAGACATTAAAAGAAATC
AGACTTAGGTACAATTGCTCCCCTTTTTATATA[C/T]AGACACACACAGGACACATATATTAMCAG ATTGTTTCATCATTGCATCTATTTTCCATATAGTCATCMGAGACCATTFTATAAMCATGGTMGAC CCTTTTTAAMCAMCTCCAGGCCCTTGGTTGCGGGTCGCTGGGTTATTGGGGCAGCGCCGTGGTCGT
WI-7901 C 33 CAC I CAGTCGCTCTGCATGCTCTCTGTCATACAGACAGGTMCCTAGTFCT
AGACTTAGGTACAATTGCTCCCCTTTTTATATA[C/T]AGACACACACAGGACACATATATFAMCAG ATTGTTTCATCATTGCATCTATTTTCCATATAGTCATCAAGAGACCATTTTATAAMCATGGTMGAC CCTTTTFAAMCAMCTCCAGGCCCTTGGTTGCGGGTCGCTGGGTTATTGGGGCAGCGCCGTGGTCGT
Wl-7901 b 33 CACTCAGTCGCTCTGCATGCTCTCTGTCATACAGACAGGTMCCTAGTTCT
CΛ
VO
ACMTCTCAGMGGACTGTGCMGTCMTGAGTCGCTTGTGMTTCTCATCTGGAM[C/T]GATCCC ACGTCTTAGMCCTTCACCACMGGAG I I I I I CTTGTAGTGATTCTCAMGTCTTGGTAGGCATTCGA ACTGGTCCTTTCACTTTGAGATTCTTTTCTTTTGCGCCTCTTATCMGTCAGCACACACCTTTTCCMG
Wl-8021 b 57 GATTTTACGTTGCGGCTTGTTAGGGGTGATTCGMTTCGGTGMTTGCCA
ACMTCTCAGMGGACTGTGCMGTCMTGAGTCGCTTGTGMTTCTCATCTGGAM[C/TJGATCCC ACGTCTTAGMCCTTCACCACMGGAG I I MTCTTGTAGTGATTCTCAMGTCTTGGTAGGCATTCGA ACTGGTCCTTTCACTTΓGAGATTCTTTTCTTTTGCGCCTCTTATCMGTCAGCACACACCTTTTCCMG
WI-8021 57 GAT ΓACGTTGCGGCTTGTTAGGGGTGATTCGMTTCGGTGMTTGCCA
CTGAAMTTTACTATGCTCTCCACMCMGAGCTCCCATTTTCCACAGACACAGTCMTGTCAGTCA GCTTGTATTCAGGAGGACAGGGCAGAGGGATCCCAGTGGCACTTCCCATGGGMGACAGMGAGAGT GGGCCCCAGAGATGGMGGACCCCAGTGTCATCACCAMCMCCATTTCAGCCGCTCTAGCCTCTM
WI-8024C 206 TTCCC[A/G]CTCTAGMCAGCTGGCCCTGGTCGTCAGTACACMGGAMGAGC
CTGAAAATTTACTATGCTCTCCACAACMGAGCTCCCATTTTCCACAGACACAGTCMTGTCAGTCA GCTTGTATFCAGGAGGACAGGGCAGAGGGATCCCAGTGGCACTTCCCATGGGMGACAGMGAGAGT GGGCCCCAGAGATGGMGGACCCCAGTGTCATCACCAMCMCCATTTCAGCCGCTCTAGCCTCTM
Wl-8024b 206 TTCCC[A/G]CTCTAGMCAGCTGGCCCTGGTCGTCAGTACACMGGMAGAGC
GMTGAGCCTTCCTAGCGCCGAGGGACCTGCTGCTGTTGTTGGCCTGCACATGCATTCTATGGMTGC TTTTTGGCCMGCGGGGGCACTGAGGACTMGCTCTGANNNNNNNNNATCTCGCCCAMCTCCTTTCT MGGAGTCTGGGGTGTCATGCCCTACAMCC[A/G]TAAATTCTCATCAGATGGATTTTATTTMCGTT
WI-8077 1 67 GTGTATTGTGACTTACTTTCCAATCTGACTCTGGCATMCMGGGMAAA
TCTAGGTTTMTCMAGCMTTTGCANTTTGGATTTTGGMTGACCACTCCCTTGCTMGGMGCTAT GTACTTCATGCTGTGGAMCTGGCAMTACAGMTGTAGCTTGTTT[G/C]TTTTCTTAGCCTTGMGA TGACCAGGTAGAGAGACAGAGTGAGACCMCAGTTTTTCTGATTTCCCTGCTCCTCCTATTCCTTCCT
Wl-81 18f 1 1 4 AAAMTCAGACTCATTGTGACCAGTAGTCTTGAGGACTCMGCTGMTGA
TCTAGGTTFMTCMAGCMTTTGCANTTTGGATTTTGGA[A/G]TGACCACTCCCTTGCTMGGMGC TATGTACTTCATGCTGTGGAMCTGGCAMTACAGMTGTAGCTTGTTTGTTTTCTTAGCCTTGMGA TGACCAGGTAGAGAGACAGAGTGAGACCMCAGTTFTTCTGATTTCCCTGCTCCTCCTATTCCTTCCT
WI-8118Θ ! 40 A d- AAAAATCAGACTCATTGTGACCAGTAGTCTTGAGGACTCMGCTGMTGA
TCTAGGTTTMTCMAGCMTTTGCANTTTGGATTTTGGMTGACCACTCCCTTGCTMGGMGCTAT GTACTTCATGCTGTGGAMCTGGCMATACAGMTGTAGCTTGTTTGTTT[T/G]CTTAGCCTTGMGA TGACCAGGTAGAGAGACAGAGTGAGACCMCAGTTTTTCTGATTTCCCTGCTCCTCCTATTCCTTCCT
Wl-81 18d 1 1 8 ' T G — AAAMTCAGACTCATTGTGACCAGTAGTCTTGAGGACTCMGCTGMTGA
TTFTTAMTATGCCCGTTTAGAGCAGACACAGTCACMTMMGTTMAMGTTACMTGTGTCCAG TGTATATACCCAGGNMTCCATTCTTGGTACTTTTCMGAGCTGCTGTTATACTGAGTCTCTGAGMG TCCCCTTAGATMTAGCTGCCACTTTTCAGTATGGTTCAGMT[G/A]AGTATCTTAGTATTCTTTCTA
WI-8321 1 78 TTTTGCTATGGTTCTAGTTFATCMCCTACTTTATTAGCTGMCTGTTGGC
TTTTTMATATGCCCGTTTAGAGCAGACACAGTCACMTMMGTTMAMGTTACMTGTGTCCAG TGTATATACCCAGGNMTCCATTCTTGGTACTTTTCMGAGCTGCTGTTATACTGAGTCTCTGAGMG TCCCCTTAGATAATAGCTGCCACTTTTCAGTATGGTTCAGAAT[G/A]AGTATCTTAGTATTCTTTCTA
WI-8321 1 78 TTTFGCTATGGTTCTAGTTTATCMCCTACTTTATTAGCTGMCTGTTGGC
TATGTACTCACTTTCAGTTACCCCCGTGCCTCCAGMTCGCATGTTGCTCCACCTGGGGGCGGATATA MTTACCTCTAGATTGTCCAMGCCCAGTCTTTCCCTTCCCTGTGCAGCCTTAGA[A/C]ACTMGTAG CAGTACTGTTTGGTGTGTGTTTGTTTCTTCCCCAGCMTGCCTACTGCAGCTACTTAGTMCMCTAG
Wl-8332b 123 AGGTGGAGGGTNTCCGGGGMGCAGTTAGATGAGTTMGTGTGATGCACA
TATGTACTCACTTTCAGTTACCCCCGTGCCTCCAGMTCGCATGTTGCTCCACCTGGGGGCGGATATA MTTACCTCTAGATTGTCCAMGCCCAGTCTTTCCCTTCCCTGTGC[A/C]GCCTTAGAMCTMGTAG CAGTACTGTTTGGTGTGTGTTTGTTTCTTCCCCAGCMTGCCTACTGCAGCTACTTAGTMCMCTAG
Wl-8332 1 1 4 AGGTGGAGGGTNTCCGGGGMGCAGTTAGATGAGTFMGTGTGATGCACA ON SI
TGCGGGCTTMCAGGMGCATGACTGGGAGGCCTCAGGMGCTTATMTCATGGCAGMGGCGMGG GGMGCMGGACCTTCTTCACATGGCAGCAGGAGAMGAGMGMGGGAGMGTCTACACACTTTT AMCMCCAGATCTCATGAGANTTCCATCGGGAGACAGCACTAGGGGGATGGCACTAMCCATTAGA
Wl-8378b 31 1 MCTGCCCCCATGATCCMTCACCTNTCACCAGGCCCCTCCTCCMCACGTGGGG
TGCGGGCTTMCAGGMGCATGACTGGGAGGCCTCAGGMGCTTATMTCATGGCAGMGGCGMGG GGMGCMGGACCTTCTTCACATGGCAGCAGGAGAMGAGMGMGGGAGMGTCTACACACTTTT MACMCCAGATCTCATGAGANTFCCATCGGGAGACAGCACTAGGGGGATGGCACTAMCCATTAGA
WI-8378 308 MCTGCCCCCATGATCCMTCACCTNTCACCAGGCCCCTCCTCCMCACGTGGGG
•AGCACATATTTAGCATTMGCCTCMACGATACAGCMTATGTTACATTCTCTTGTGAAMCAG TTGTTGTAGACTGTTMNNNNNNNNMATGTMCTCCGACTTGTGCCTMTAGGATTTGACCNTTAA GAGGNTTCTTTTGCTGTGGANGGGGTGGCTTTGCTTGMCTTCCATTCTGtT/G]GCCTTGTAGCTGGTG
WI-8426 1 84 G AGGCTGGGAGTATGGANGGNCCCGGGGCCCTTGGCNATNGNATFCAGTGAG
TTGAGCCTCCACAMTMTGCAACCMGTTTTACATTTTTMCAGCCCTTCTACATACACT[C/A]CA TCTTCTCTATCTTAGTTCCMGTTTTAGTTTTCMTCCCMπATACCMTTCCATTGTTATTTTMGA MAMCCTTCCCAGTTATTGTCAGAMCTATGATTTAGCTTACCCCCTCCACTACCCAGCAMCTAC
Wl-8450h 61 C A AGAGAGGATGGGAGTGTMTATGAGCAGTACAGAGTCTTMTGCMTTCAT
ON UI
ON 4i-
ON CΛ
ON ON
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGMGMMTTGGCMTCT TTA
GGGGTACCMGGNTCTGAGTTTGTACGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCCCC[ C/AJATTTCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTG
Wl-9676h 1 34 AGGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
GGCCACTGTCCAAAGTCTGTCACAGTCCTCCATATGGCAMGATGAAGAAAATTGGCMTCTTTTTA GGGGTACCMGGNTCTGAGTTTGTACGGTCTTTATAMTGCAGAGCAAGATGTGGCTTTCCTGCCCCC ATTTCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTGAGG[
Wl-9676g 202 C/ηCAGGGTCTCTCAGCTTTAMGCCTTGGAATCCTATGCATTGTTTGTTT
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGMGMMTTGGCMTCTTTTTA GGGGTACCMGGNTCTGAGTTTGTACGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCCCCC ATTTCACCTCMGGCATCTTCAGCAACCCCACATGGCTFCCCTCTGTGC[G/ηCATGAMTAACTTGA
Wl-9676f 1 84 GGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGAAGAAMTTGGCMTCTTTTTA GGGGTACCMGGNTCTGAGTFTGTACGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCCCCC ATTTCACCTCMGGCATCTTCAGCAACCCCACATGGCTΓF/CJCCCTCTGTGCGCATGAMTMCTTGA
Wl-9676e 1 73 T GGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGMGMMTTGGCMTCTTTTTA ON -4 GGGGTACCMGGNTCTGAGTTTGTACGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCCCC[
C/A]ATTTCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTG
Wl-9676d 1 34 AGGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGMGAAMTTGGCMTCT TA
GGGGTACCMGGNTCTGAGTTTGTACGGTCTTTATAAATGCAGAGCA[A/G]GATGTGGCTTTCCTGCC CCCATTTCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTGA
WI-9676C 1 1 4 GGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
GGCCACTGTCCAAAGTCTGTCACAGTCCTCCATATGGCAMGATGMGAAMTTGGCMTCTTTTTA GGGGTACCMGGNTCTGAGTTTGTA[CtηGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCC CCCATTTCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTGA
Wl-9676b 92 GGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATFGTTTGTTT
GGCCACTGTCCAMGTCTGTCACAGTCCTCCATATGGCAMGATGMGMMTTGGCMTCTTTTT, GGGGTACCMGGNTCTG[AC]GTTTGTACGGTCTTTATAMTGCAGAGCMGATGTGGCTTTCCTGCC CCCATTFCACCTCMGGCATCTTCAGCMCCCCACATGGCTTCCCTCTGTGCGCATGAMTMCTTGA
|WI-9676a 84 A C GGCCAGGGTCTCTCAGCTTTAMGCCTTGGMTCCTATGCATTGTTTGTTT
TGGACCAMCACAGACAGATGTATTCCTGGTGCCTGTGTA[C/AJATTACMCTCATTGATCACATGC AGCMCATCMCATCTCMGGAGTCCATTTGTTCAAAACACAGTAMTGACTCCACATTTTCCCTTT GAGTCMCAAMGACTCTGCTTGTCACCTTGCCTGGAGCGGGGTGGTTTTTCACTATGTGAGTATCTA
Wl-9738b 40 TCTTTTTATTTCTGTCCCTTATGTTGGTGGGCACATGTCTGTATTGCTGTCC
TGGACCAMCACAGACAGATGTATTCCTGGTGCCTGTGTA[C/AJATTACMCTCATTGATCACATGC AGCMCATCMCATCTCMGGAGTCCATFTGTTCAAMCACAGTAAATGACTCCACATTTTCCCTTT GAGTCMCAAMGACTCTGCTTGTCACCTTGCCTGGAGCGGGGTGGTTTTTCACTATGTGAGTATCTA
WI-9738 40 TCTTTTTATTTCTGTCCCTTATGTTGGTGGGCACATGTCTGTATTGCTGTCC
ACTGAMTGTAMTGGCCMGGCACCCAGGACCTTAAAMTCATMGMGTTMTCTGTGGGMM GAGTMCTACAAMGCATCTAMCMGAGCAGGATGTGATGTAATGTGTCCCCTTATCACTTTAGTC AGTAAAGATMGMAGCCCTGGTGAGTATCCACTTCCACAMCACACAGMTATACACTTTTGGMG
WI-9756 47 ATΓTCCACTTMCCACTTGATTCTTCAC I I I I I I ATGATTTAAAACTCTCCGTGG
GATGGTCCCTTMGGATTTGCATTGGTTMTGGGCAGACTGGTGCAAMGAGGCTGMTTGMTMT TAGGMACTGGGAGMTTCAATTCAMGMGMTTCTTGTTCGCMGGTCMTTTTTATACTATTTA A[A/G]TAAAATAACTCTGGTAGGTTCTATAGCAMTGCTAAGTAAAGTAACCGCTGGTTTCTAAATT
WI-9758 1 35 A G ATTACG
ATTTAAATCCAGGCAGCGGGGAAMTGGATACTTTCATATGTCTCTGTACCCMCTATAMCTTTTG ON CO GTTCTCATGCACCATTTTCATTTTGCCTTCTCACTCCMGTACCACTGATTTTACCMTT[G/A]CTCTC ATMTTGACTTTGCTACTGGMGAMCTCTTAGMTGTFGGMTTTCTCTATTACACACTTTGCCTCA
WI-9778 1 27 MGMTGTGTCAGTCAGGACTAMGGCMTAGTCTCAGGGCAGACAGCC
TCTCCCCTTTGCCTCCTCATGCCCACTCCCTCAGCCTGCACAGAGCGTTTCTCCAGTGTAGTCTCTGGT CCATCTGCATCAAMTCACCTGCAGGACTTGCTGACMTGCAGTTTC[C/A]TGGATCCCACCCAGGA CTCAAAAAMCTAGGMTTGGGAGMGAGGGACCTGGMTCGGTGTTGCTAGCMGCCCCCAGGTGG
WI-9832 1 1 6 A TTTGTMGTGGACTAMGTTTGAGGACCAGACATGGMGGTTGGCTTTGGC
TGGAMMTAGC" TTTATCAATCTCTGATATGCTACATATGTCATGGAGAMTGCAGMTGGCATGA TATGAAATTCCA" TTTTGAATGAATAAAATATAC[A/G]TGTGTATGTATATATACTTATFAACACTT
AGGATTATATACACACMTMAACGTCTGTMGGATAMCTMGGTTCTATCAGTGGGAMTGAGA
WI-9841 1 01 G- TTGAAMGAGGGGGATGTGTTACTTGATATGCTGTTG
GMCTMCACCTTTCTTGCATGGA I I I I I CTTGATTATTGGCAGTTAACMTMMTGTTATTAGATC ACTGGTGCTTCTGTGTGGGGTTGAG I I I I I l ATGATATCTCCTGTTAGACCCATMGGGAGGCTGTGA GTTGTTTTCTACATCCTTGGACTATATAAGATCCTCTTTTMMTTATATTTTATATMGCACATGAA
W1-9880C 222 G A AATGGAATGAAATAATGAfG/AITTGACATAGGAATTACCTACATATTTTG
ON VO
-4 SI
-4
UI
-4
4i.
-4 CΛ
AGTATACAMCATTTMGCTGTGGTCMGGCTACAGATGTGCTGACMGGCACTTCATGTAMGTGT CAGMGGAGCTACAAMCCTACCCTCA[A/GJTGAGCATGGTACTTGGCCTTTGGAGGMCMTCGGC TGCATTGMGATCCAGCTGCCTAπGATTTMGCTTTCCTGTTGMTGACAMGTATGTGGTTTTGTA
DWU-252 94 AT
GMCATTCCTCTGCAGCACTTCACTACCAMTGAGCATTAGCTACTTTTCAGMTTGMGGAGAAM TGCATTATGTGGACTGAA[C/ηCGACTTTTCTAAAGCTCTGMCAAAAGCTTTTCTTFCCTTTTGCM CMGACAMGCMAGCCACATTTTGCATTAGACAGATGACGGCTGCTCGMGMCMTGTCAGAM
DWU-330 85 CTCGATGMTGTGTTGATTTGAGAMTTTTACTGACAGAAATGCMTCTCCCT
GAAAATGTTAATTGGGCAGGTGAAMGGGTACAGATGTGCTGTAGCAGACCTTTGGTTTTAAMGAG
MGCATCATTTCCCCMCAGGGCMCTGTAGMGGCCAGCTGMGAGTAMGGAAMGGTCTGAGG
ACTGAGCCTGTGGCTGGCTGGAAAMGGTGMTGTTGAGGGCCCTTCACTTCCATCACMGAMGTC
DWU-370 231 i ATTAGACGGTACCMTTCAGTGTCTGTTCCT[A/G]GCATCTATTTCCTCTGTGC
CTCTTAACTTCAGTTCCCTCATCTATAAGMTAAGGGATTCAGTTGTGATCACATAGCTCAGGTAATC
DWU- CAGGACCAGAMCCCAGGAGC[A GJTGGGACCTGATCCACAGCTAGAGGATGGGGGACTCTGTAGCT 1537b 89 ACAGCATTTTCCTGMCACACMGMATCCAGTMGCAGCACACACTGGCTGA
CTCTTMCTTCAGTTCCCTCATCTATMGMTMGGGATTCAGTTGTGATCA[C/T]ATAGCTCAGGTA! -4
ON
DWU- ATCCAGGACCAGAMCCCAGGAGCATGGGACCTGATCCACAGCTAGAGGATGGGGGACTCTGTAGCT 1537a 52 T ACAGCATTTTCCTGMCACACMGAMTCCAGTMGCAGCACACACTGGCTGA
ACCATCTTATACTATGGCAGGTMGTCCATACAGMGAGCCCTCTCTCCCTGGGATTTGAGTGGGGTC CCCAGCTCCACCCAGAGGCCCCTGGGGMTTCCAGGGTCACTGTTCCTTCCTGTCTCCCTGTGGGMT
ESTD- CMGCCAGCTCCAGGCCAGMGTGGGACTGTGAGGACATGGAGGCCTCGGCACTGAGCTG[C/G]AGA ADAb 1 96 CCCGCAGACCMCTCCTGAGCTTTCTGGGCCTCTGAGTCTTGTCCTC
ACCATCTTATACTATGGCAGGTMGTCCATACAGMGAGCCCTCTCTCCCTGGGATTTGAGTGGGGTC CCCAGCTCCACCCAGAGGCCCCTGGGGAATTCCAGGGTCACTGTTCCTTCCTGTCTCCCTGTGGGMT
ESTD- CMGCCAGCTCCAGGCCAGMGTGGGACTGTGAGGACATGGAGGCCTC[G/AJGCACTGAGCTGCAGA ADAa 1 84 CCCGCAGACCMCTCCTGAGCTTTCTGGGCCTCTGAGTCTTGTCCTC
TCTCCTGTCATTCCTACTCCATTAGTTCMGGTCAGTGMGMCTGGGGCMTTMCCMGTMTTCA
ESTD- TGGACTGCCCMCTGCGMACMGMGGGCGCAGTGGAGCAGGAGTATTATGCTACGCGGTTACCTT ANT1 1 60 Tl TTTTTATGGAGGACCGMCTGAGGCrr/qGAGCTCAGATGATCCTGT
TGCCTGGGGTGGCMGGCTGCAMCMGGAGGCMCCCAGGAGGCTTTTATGMGCGGGCCATGGTA
EST10398I AGATGCTGCCACCTCTTATCTACTTGATGATGTTCACATTTGGGGCTTGACTTTCCMCACGGAGMG 2b 1 68; A' G - CATTGTTTTCTTCGGGCCMGMGGTATCTACCrA/GIATAGTGTCTATTAGGCATTTG
-4 -1
-4
00
-4
VO
oo s
00
oo
SI
00 U
00
00 ON
00 -4
00 00
00
VO
VO
©
VsO
VO U
VO
VO CΛ
VO
ON
VO ~1
00
V VO
s
© o
SI o
SsI
S)
SI
© UI
©
S © CΛ
s>
©
ON
Ss)
-4
I
© oo
AAAGCATGAC CGCTTATGTTA AATAAAATGA ATAGTMTTCC CMGTGAATATTGATACATGGCTGACMAGCATGACMTMMTGMCAC[A/G]TACGGGMTTAC
WI-17904 50 G ACAC 03 TATTAACATMGCGATAACATCAAAACATCTGGTAMATGCAGTTAAAACMCAACACAMTGA
TGCCAMTAC MCTACTAGCG G I I I M I CTTTGAGTGACACMGCTTGTTCA I I I I I GAGAAMTGTGTGCCMATACTCMGTGTGM
EST34149 TCMGTGTGA AGMCAACTA T[A G]GATTTTATTAGTTGTTCTCGCTAGTAGT ΓGGTATTCTATGMAMMGCAGCTAGTTCAGC 5 69 AT ATAAAATC TT ACAAATCACACAAGT
TGGGAAMCATMGTTMCTCMGMTATATFCCAGTCTTTATGTTACTMMCATTGTMTAGTGT
EST34343 TTTTATCAATGATGCCGAGGTCACTGCT[C/A]TACAMGATTAMGMACTTACCATCAAACACTTC 8 95 CAGTGCATCM
GGACCATATG CAGAMTTATG GGTACACAATTTTMTGGMGGMCCACAGGTATGTTGAMGMCATCAGTACAGCTGGAGACAGG ATATATAACT TGATAATAACT GAGGGACCATATGATATATMCTCCTAMAGC[C ]GGMGGAGTTATTATCACATMTTTCTGGGC
WI-17982 98 CCTAAMGC CCTTCC GCTACAGMG I I I I I CATCA
CTCAGTMCTCCGGTGTATMTCTGCCATTTATTGATTTATTTATGATAAMCMCCTCTCATTGTGA AAMCAGCTMGGGTGACATCTCCAGACCCMCCACTGTCCCTGTMTGT[A/C]CTGCTGAGAGTCC
WI-17993 1 1 8 ACATTTTGGAAATCCAAT
CCCATCCAGAMCCCCAGTGTGATGGTGGMGCAGCATGAAMCMCATCTCCCCAGGCCTCGCAGT
GTAGAGGCGA AGGCACATGGG AGAGGCGAAGGGMCAG[A/GJGCTGCCCATGTGCCTGTCTCTAMGACGCCACCCTCAGGTTGATGT
WI-17996 84 G AGGGMCAG CAGC CACCTGTGGGAGACCGGGT
ATTCTTTATAAAAACACCATGTCCCTAAAATGT[C/GJATTCMCATATATGCACACCTTCGATGTAT
WI-17136 33 AGGACACTGATCAAMMGACAGAGAMTGTGTCCCT
GCCACTGAAAAMGGTGCTCTTCC[A/C]GTTTCTMCTCCCTGGACTCCCTCATTGGMCTGMGCTC ACAGATGTTTCAGCTGGACTAGT1TAGACTTTGCTGTATTTTAMAGGCAGTGTTGATGCTCCAGGAT
WI-18041 24 TCAAATACTTAATCA
EST35164 CACAGCCCTGC CCCTCTGGATT TTGMCCMGGCCCTMCAGATGACTCAGCAGGGCCTTCMGCACAGCCCTGCCCCC[AG]TCTTGA 8a 57 G CCCC CTGMTCTCM GATTCAGMTCCAGAGGGTGCTCAGTCCTTGGTTTAGGTGCTTCTGTGACATTTCCTCTTG
AGCGMTGMMTGCTACATAGGCTCCCTGAGTTCTTTCATGTACGMTCTTGGTTACACATCTTAGl
Wl- AGJACAGCAGAGCTGCCTGAGGGAGGGTTGTGTTTMTGTCGTATGCATGCTCAGCACAGTGCTGGC 18052b 67 ATGGCCCATCCATGCTTT
CCTGAGTTCTT AGCGMTGMMTGCTACATAGGCTCCCTGAGTTCTTTCATGTACGMTC|T/C]TGGTTACACATCTT
Wl- TCATGTACGA CTCAGGCAGCT AGMCAGCAGAGCTGCCTGAGGGAGGGTTGTGTTTMTGTCGTATGCATGCTCAGCACAGTGCTGGC 18052a 50 ATC CTGCTGT ATGGCCCATCCATGCTTT
GGGAGTGGGG CGTCACCCTGC CTGTTGTGCTGAGMCAGMGGGGTCMGGGAGTGGGGGAGTAAAA[G/A]TGGMGCAGGGTGACG
WI-18054 46 GAGTAAAA TTCCA CATGCAGGAGTCCAGACAAMGACGGGTGATTTTGCTCAGGTTGGTAGCMCAGAGGTMTG
S)
O
S)
S) S)
s>
U)
S>
TCATCTGAGA CATTATAGGTA TCC I I I I I ATTCATGATTTGTTTCATCTGAGAATAMCTTCCTGTCTMTTTTCCAA[C/G]ACTATGTT
EST39236 ATAMCTTCCTj CTGAGTCATAC TAATGTATGACTCAGTACCTATMTGAGACTGGAMTATATTACCTGGCAMTGMTGAGGTGTCTC Ob 57 GTCT ATTAAACA
GCACMTTAA CAMCAGACCTTTGGTTTGAGCTCACCTGGTGACAGGAGACTCCTACCTGAMCAGGGATGCC[G/η
EST39294
CCTGAMCAG ACATAGTACCG TTCTCGGTACTATGTTTMTTGTGCTGAGCCAGCMCCCTCGAGTTACCCGGCCTTTTACCCCACGCC 4 63 GGATGCC AGAA AGCTCTGCTTGTCTGCAT
AGMAACATTCTGTCTGATCAGAGGMGATGTATGTAGAAMTCAGMTCTGACTGMTTCCTMA
EST39366 ATCTAT[T/C]ACACTGAGAGGAAAATGGAAAAGAAMTGTTTGCATAMGCTTTTCCCTGACTCTCA 2 GAGGGGTTCAGA
TGATTTGAGAC AAAMGCTGTAGCTGGCMGTCAMGTTTATTTTATGTGTGTAMTTCCCAGTTGAGCATTFTTTCAT
EST39371 CATTTGGATTA ATTTCACATTT TTGGATTAGCGTGAGAGG[A/G]AAAAATGTGAMTGTCTCMATCAAATGCTTCCTTCTMAGATTA 9 86 GCGTGAGAGG TT GACATTGCCCMCCCTGC _
ACMGTGACATATCCMCCMCC[A/G]TCCATCCCCACCTGTGCCCTATTCTTTCCTTGTGTTTCTTT AGAGCCTTTTCAGCTATTTCCTGTGMGCAMCTGCACGMGGCCTCCCCCGTACTCCTCCCCTGGM
WI-17177 23 G _
AGGTTCCTGGTTGCTCCCCACMTTTTGATTIC/TJGGTGGCTTCATAAGGGACCCAGGATFCTGCATT S
EST39428 GCTCCCCACA GGTCCCTTATG TTCTGGGTGGGGCCTAGGTMTTCTGTTGCCTTTGGTCCACAGAGCACMTTAMGMGATCAGGTCT CΛ 8 31 T ATTTTGATT AAGCCACC GGCTGTTGC _
GGCAGAGGM
EST39430 TMCTGATGTT CAGGGGTCGGG MTTTAGCAGAMCMTGMGTTGGCAGAGGMTAACTGATGTTC[A/C]CAATACCCCGACCCCTGA 2 45 C GTATTG CCCAGTACCTTTCCCTCAGGCCCAGGCTCCGGTGGAGGATGTCCTGGG
CTACTGACAT AAAGCCCTGTAMCTGMGCTAGACMCGTCMCTTTGGMGMAATMCAGGMCCTATTTATAT
EST39446 AGGGACTTCA TCCTGGAAMC ACGTAMTCACTTTCATACCTGCCTACTGACATAGGGACTTCAGAGTMTA[CAF]GGTTTATGTCAGT 7b 117. GAGTM TGACATAMCC TTTCCAGGATTGTTCTCCC
EST39465 MTGCAGGAG CMTCTCGGCC ATGGTGTCATTAGAGGGCCACAGGGGATGGGGGAGTAAMAATMCATAMCGMCTGMCAGAM 2 80 GGTGGC CCTCT TGCAGGAGGGTGGC[A/G]AGAGGGGCCGAGATFGGGTGTTCAGGGCAGAGAGGTGGMGACCAG
AMGATTCCT
EST39501 GTAGACATCT CACTTGCMTT TGCTTACMCCCATMCCATAGGCCATGTGTTCAGACATTCTTGACCMGCCTMAGATTCCTGTAG 0 81 MCATTAG CTGMGGCT ACATCTMCATTAG A/GJTAGCCTTCAGMTTGCMGTGCMGTTCAAGTCMACCMTTC
CACAAMTGGGACTGCTGMGAGTGGACAGTTGGACCTTACTTTGGTGACCCCATACATTTGTGGTCAI
Wl- CATGCTTTAGCCATAqA/C]CATGGTMCATTGACTATGGAGTCTTGTGAMGTGTMTGTGCGATG 1 8387b 84 A C - GCTATGTAGACATAAAGA
SI ON
S)
SI
VO
SI
S)
SI ) )
CAGGCAGGACTTCAGTGTCAGTATCCCTGCCTTCAGTCTTCTTTAGAMTCACATCTGTGTTCMTCC ATTGTTTAGAGGGAGTGTA M i l l CCTGTTCCA[C/ηGMGAGGACTTTTTGTTCACMTTGGATCAC
D63807 1 01 T ■ MTGCAGAGGAGTCTGTTCCTCCCCCGTCGGCTTCTCGGTGCTGGGAGGGTGACCTGTCCCAGATGAC
TGGGMCATGCGTGTGACCTC[T/C]ACAGCTACCTCTTCTATGGACTGGTTATTGCCAMCAGCCACA CTGTGGGACTCTTCTTAACTTAAATTTTAATTTATTTATACTATTTAG I I I I I ATAATTTATTTTTGAT TTCACAGTGTGTTTGTGATTGTTTGCTCTGAGAGTTCCCCCTGTCCCCTCCACCTTCCCTCACAGTGTG
D90145 21 C - TCTGGTG
EST14035 ATTATCACTCTCMAAATTTTGGTGTGTGTGTTTMGTACTTTCTTATTTATGAGCCCC[T/C]GAGGA 1 a 59 c - CCAGACATGTTATTATCMGCCCCTTATATACCATCTMT
EST16668 GCATTTTAAAATTCACATTGMTCATTATTTACTATTTATGATGTTTACATAACMTTCAGTATCATT 5 71 T ■ ATG[C/T]TGTAGATTTCAGATGTAGGTCGTCAATACTGAGCACTTATCT
EST16904 ACAGACTATCGCCAACTTATMTGCTTAAACTTTATGATCMTAGTAATAMTTACA[C/T]GAGATA 7 5_7 T TTCACACTTTATTATAAAATAGGGTTTGTGTMGATGA I I I I I CCCMCTGTAGGTTMCAT
EST21863 TTTTTMGTACCAGAGGCACTGCTGGMCAGGATGAAMCTGATACACCIA/G]GTTACTACTTACTC 9 49 G - TTCACTCTTCAMCTGATTCCCCTAMGACTTCTACTTAGCAM
SI
EST21885 GGCTGTMGTAGAATCAMGGTTMGMCATTTTATGCACTTATTCCACAAACATTTACTGAGCATA SI
UI 6 80 A ■ CTAGGTGCTGGGA[G/A]TGTGACAGTGAGCAAAMACACM
EST22623 ATTTTAGTGCAAATGACAMGCCCAA[A/G]AGMCAGAGGATCAMTAAGATTGAAATGTATTACC 8a 26 G_ TTCTCATMGTATACGAAGTTTAACACMGTATGGGAGT
EST22644 AAAATGATTGMTTCAGCAAGTACATTTATGATCTATCTACATTGTTAAAACAGCACTAAAAATAA 2 98 G MA M I L L AAAATGATTATCCATTATTTACAG A/GJAAATGTGGAAAAGATGGCTTTTAAACCC
EST23587 | CCTCATTTATTTAAAAAGACGGACATAAAAA[T/A]TATACAACAAAMACCCAAGTCACATTTCAG 1 31 A GAGGTAAMACTAAAMGTCTGATATGAMATATGGTGG
MAGATCTGGCATTATTCACATCATTCTMATATTTTGTAATTAC I I I I I CCATGAGTATTTTTTTCA
EST24246 TGTCCMGCATTTTMCTATCATTTTAGCGTAMTACCΓF/C]GMTAACCCATAGTTACAGAATTGG
7 1 06 GTCTGTGTMCCTCMTT
TAGTTTMTTTTCTGMCCTTTGGCTTATAM I I I I I CTCAACTT[A/G]CATTTAAAMTGTATCAAT 45 GCACCTTCTTCAGTAGTACCACATGAAMTATAMCCTCGTTC
EST24435 CTTGMCTTCTGGTCTCMGTGGTACGTCCGTCTCMCCTCCCAAMTGATGCGATTACAGGCATMG
(3 73 CAGCCLGYALTGCCTGACCCACATTTFCTTTATCCGATCTGTTGATGGACATTCAGGTTGTTTC
EST25089 TATTGTTGCATTATCAAAATGGTTAΓF/C]AGTTTTCMTTAAAACTGTAATTGATTTCTATGTATAAA 6 25 ACAGCTFTGMGTTGTMATGTAGTTTCCMTCGTTAGTTMTGCTACATT
s>
SI
s
S) CΛ
SI SI
S)
S)
00
SI s vo
SI oI
I
UI
AGTTGCCAGCTCCCATGTACCAGCAGCTGGMTCTGMGGCGTGAGTCTTCATCTTAGGGCATCGCTC CTCCTCAC[G/A]CCACAMTCTGGTGCCTCTCTCTTGCTTACAMTGTCTAGGTCCCCACTGCCTGCT GGAMGMMCACACTCCTTTGCTTAGCCCACAGTTCTCCATTTCACTTGACCCCTGCCCACCTCTCC
U31416C 76 MCCTMCTGGCTTACTTCCT
AGTTGCCAGCTCCCATGTACCAGCAGCTGGMTCTGMGGCGTGAGTCTTCATCTTAGGGCATCGCTC [C/ηTCCTCACGCCACAMTCTGGTGCCTCTCTCTTGCTTACAMTGTCTAGGTCCCCACTGCCTGCTG GAMGMMCACACTCCTTTGCTTAGCCCACAGTTCTCCATTTCACTTGACCCCTGCCCACCTCTCCA
1131416b 68 ACCTMCTGGCTTACTTCCT
ACGGGTCACACAGAGAMCCTGAGTCTAGCCATGAGGGGCTTATGCTCCCMCTCACATTGTTCCTCC AGACCGCAGG[CTJTCCCCCAGCCTCAGGTTGCTGGAGCTGTCACATGACTGCATCCTGCCTGCCAGG GCTGCAMGCMGGTCTTGCTTCTATCTGGGGGACGCTGCTCGAGAGAGGCCGAGAGGCCGCAGMC
U37519a 78 ATGCCAGGTGTCC
GACCACGCTGAMCCCACCCACCCGCTGTGCTGACCATGGGCCCTGAGCGTCCT[A/G]CCCCGMTTC ACGAGGCTGAGGCATCCGGGAGCTGGCGTMTGCCTGGCCGCAGTGTGTGTGTATCCCATACCCCACT
U37690 54 A! G CTGGMGGMCCATCCAGTAMGGTCTTT
TGAAACCGTTTCMCATGGAAATGATCTGTATTGACTM[T/C]ACACCAGTCCACACTTCTATGACT ) UI TCTGCCATTTCAMGACTCATTTCTCCTATMCCACCGCATGAGTTGMTCAAMTTTTCAGATCTTT SI TCAGGAGTGTMGGMACATCATGTTTACCTGTGCAGGCACTAGTCCTTTACAGATGACCATGCTGAT
V00540 39 A
TCMGMGGTGACTGCCCTTGTATGATGGGATGGGMGATGAATGACTGGTTTTTACTGGGGTGTM AACCACTCTGAGCCTCTCTGAGACCATGTGGTTTTAAM[A/ ATCCATMGGGMGGTACCCACAC CAGTATCTGAGTTCCAGTAGCTMGACCCTAGMTTTGGATTCATCTCTG I I I I I I CATGTCTCTCCTT
X15943 1 06 GTMCCCTGAGATCATCAG _
AGGMGATCCCACCGACCCTTCCTGGCCTMTCCTTTAGATFAGGTCACATTACATTMCATTTAGGA ACCCAGACCGAMAGTTGCTGAMGGGMGGAGACACATTCACAMGAAMGTTGCGMMTTGCG MATCTGTTGTGCA[C/ηGCTCAMTGMAACGCCTTTCGGCTTTGGGCTTTTA I I I I I I I GGMCTG
X5201 1 b 1 48 T CGAGTGGCTTAGGTCTAGCCT
AGGMGATCCCACCGACCCTTCCTGGCCTMTCCTTTAGATTAGGTCACATTACATTMCATTTAGGA ACCCAGACCGAMAGTTGCTGMAGGGMGGAGACACATTCACAMGAM[A/C]GTTGCGAMATT
I GCGAMTCTGTTGTGCACGCTCAMTGMMCGCCTTTCGGCTTTGGGCTTTFA I I I I I I I GGMCTG X5201 1 a ' 1 1 8 C - CGAGTGGCTTAGGTCTAGCCT
SI
UI UI
CATCCCMGGCACTGGTGGTGACTCTGCTTCCTG[C/T]ACTGACCCAGAGCCTCTGCCTGTGCACTGC MGCTGTGTCTACTCAGGCCCCMGGGGACTCTCTGTTTCCATTCTCCCCCCACAGACCTGTCMGAG
X87344 34 T MGCATGACAMCMMTCATTTACCGACTTTAGTGCTTTTTT
GGTGGGCTGGTATCTCAGAMGTGCCTGACACACTMCCMGCTGAGTTTCCTATGGGMCMTTGA AGTAMCTTTTTGTTCTGGTCCTTTTTGGTCGAGGAGTMCMTACAMTGGATTTTGGGAGTGACTC MGMGTGAAGMTGCACMGMTGGATCACMGATGGMTTTA[GtηCAMCCCTAGCCTTGCTT
X87838 1 79 G GTJAAAATT
GTTCTGCTGCCTCTACACAGGGGCCCTGTACAGTGMTGGTGCCATTTTCGMGGAGCAGCAGTGTGA CCTCCTGTGACCC[A G]TGMTGTGCCTCCMGCGGCCCTGTGTGTTTGACATGTGMGCTATTTGAT ATGCACCAGGTCTCMGGTTCTCATTTCTCAGGTGACGTGATTCTMGGCAGGATTTGAGAGTTCACA
Z14138 81 GMGGAT
TMTCCTCACCATTCCTCAGGTATMGTTCTATAMCAGGCTTGGMTCTGGGTMTTMAMCAGA AMTTATAGTCMTATACCATGACATGMGMTGAATCCATTCTTTGGAGATGGAGTATACATGACT GCMCTGTATTTCATACGTTCTTTTCAMGTGGGATAGCTATTGCAGCTTAMGAGC[A/C]CAGGTTC
Z18859 1 91 CAGTACTGGTTTTCCM
S)
AGMCCTGACCAGATGTGGCTCGGAGGGGMTCCAGACCCGCTGCTGTCTTGCTCTCCCTCCCCTCCC UI CACTCCTCCTCTCTTCTTCCTCTTCTCTCTCACTGCCACGCCTTCCTTTCCCTCCTCCTCCCCCTCTCCG CTCTGTGCTCTTCATTCTCAC[GA]GGCCCGCMCCCCTCCTCTCTCTGTCCCCGCCCGTCTCTGGMA
Z23091 1 59 G CTGAGCTTGACGTTTG
GTTGGCATTGTTAGTAAMCTTCATAGGTGMGAGGAGGATCAGTGAGATTMGTTATTTTATCAM GTGTGGTTTTCTGCMGGGCAGGTTTGAMCCTGACCCTAGTTGTGCTCCAGGACCTAIA/G]GCGTGC TCACTCTACCTTGTCTTTGTGTTGAMGGAGTGGTTTCCCATGACTGTTTMGTGACMGTGCCATGG
1 1595b 1 25 ATATCTACACCGTCACCAGACTAGATTGTCTCMTGTCCTTGGCTTGCGAC
GTTGGCATTGTTAGTAMACTTCATAGGTGMGAGGAGGATCAGTGAGATTMGTTATTTTATCAM GTGTGGTTTTCTGCMGGGCAGGTTTGAMCCTGACCCTAGTTGTGCTCCAGGACCTA[A/G]GCGTGC TCACTCTACCTTGTCTTTGTGTTGAMGGAGTGGTTTCCCATGACTGTTTMGTGACMGTGCCATGG
1 1 595 1 25 ATATCTACACCGTCACCAGACTAGATTGTCTCMTGTCCTTGGCTTGCGAC
TATATCACATTAGTATGTCACTGCCATGGTMGGACTTTGATCACTAGGAAATMGMCACTTTGM TGGTCTTGTCCTTTCMTMMAGAGTGACATGATTGMCATGTGTTTTAGATMAGGGCACTT[GtF ]GCAGGAGTGTTTAGGATGMGAGAGMGAGATTMGGMGATCAGGMGMMGTAGCMTGGGA
1241 1 31 G T ATGAMATAGGAGGCCCTGAGATCCACTGGATMTCTMAAMCCMGAGAMG
GTGCGATCACCACTACAGTCTMTTTCAGATGFTTTCATTACCCCTAMAGAMTCTTGTACCCATTA GCMTTATTCCTCATTCCTGCCCTCACCCCCAGGCCCTACTCTTTATCGCTATAGATTTGCC[C/ηACT TGACATATCATACACATGGAGCCATACATATGTGTGCCCTTCATGATTGGCTTCTTTCACTGAGMTA
1 282 1 30 ATGTTTTCAAGGT
AGTATCACACATACTTAATATATTAGATATACACMTMTMAATCACTCCCTACCTTGAAMCTTT A[C/ηAGAAGCATTTTTAATTTTACAACACAMGCTCAAACGMCCTACAATMGTCTAGTAGTCTG TTTACGTGCCAAGGGATAAGGCTGAACMTAMTTMCCCTTTAAAAATGTCTATGMCAAGTACAA
681 0 68 T TTTTC I I I I I GAGTTCTGCAGAGCAATGACCACTMGMATA I I I I I AMGGC
CCMGTACATTGGGTGMCGATGAGCTAGCTGTTCTAGTATTTGC I I I I I GTMTCCAGTTMGACCA TCAGCATATACAACATCATCACTMCTCMCMTGTAGCTGCAGGGTMC[A/C]TGTGGATACCCTG TGTGCTCTACTGGCCTCCAMGGCATTCAGGGGATCATCAMGATGTTGGACACCTTGTGTTCAMTC
681 7 1 1 8 πGGTTCAGGTGCGGCCTGTGCAGΛTCGGCTTTTTGGTTTGGTTGTCTTAG
CCATTTTA I I I I I CTCTAAATTTTAAAATAGMGACTTTMTGGAAAACATTFAGTACCATCATGTCA CCCTGMTGCCAGCMTACCTCGACTTTTACACACGCAGGMGCCTAGTAAMGCCCCGTCAGTAGT ACACATTTCTCTATGGTCCTTCMCAGTTTTGCATATACAAMTTTTCTGCTATTΓTGCTTTAGCAAA
6819b 21 2 CAGCMTMCTTTTGTGTTTCCTATATGACACCTAATATCCA SI u>
CCATTTTA 1 1 ΓI FCTCTAAATTTTAAAATAGAAGACTTTAATGGAAAACATTTAGTACCATCATGTCA CCCTGMTGCCAGCMTACCTCGACTTTTACACACGCAGGMGCCTAGTAMAGCCCCGTCAGTAGT
ACACATTFCTCTATGGTCCTTCAACAGTTTT[G ηCATATACAAMTTTTCTGCTATTTTGCTTTAGC
6819a 1 66 UT AMCAGCMTMCTTTTGTGTTTCCTATATGACACCTMTATCCA
CTGGTATGTCATAAGCMTCCATAATTGTTATAGCTATT[A/G1TTATACTATGGCACCATTTGGGACA
CAGATTATATATGTCAGACACCACGMTGTCCTTTMGATATGCAGCMGCACAMTCTGTCATGGT
681 xx 39 TTAACAAMGAAATGMCGTCTAGG
AGGATTCCCTCTTFTFCTATTGATTGGMTAGTTTCAGMGGMTGGTACCAGTTCCTCCTTGTACCT CTGGTAGMTTCGGCTGTGMTCCATCTGGTCCTGGACTCTTTTTGGTTGGTAMCTATTGATTATTGC CACMTTTCAGA[GtηCCTGTTATTGGTCTATTCAGAGATTCMCTTCTTCCTGGTTTAGTCTTGGGA
6972b 1 49 GAGTGTATGTGTCGAGGMT
AGGATTCCCTCTTTTTCTATTGATTGGMTAGTTTCAGMGGMTGGTACCAGTTCCTCCTTGTACCT CTGGTAGMTTCGGCTGTGMTCCATCTGGTCCTGGACTCTTTTTGGTTGGTM[A/G]CTATTGATTA TTGCCACMTTTCAGAGCCTGTTATTGGTCTATTCAGAGATTCMCTTCTTCCTGGTTTAGTCTTGGGAl
6972a 1 22 ' © ■ GAGTGTATGTGTCGAGGMT
s>
UI
ON
SI
U -4
SI
UI 06
S)
UI
VO
S>
©
S
SI
S .I
SI
UI
SI
SI
CΛ
-248-
-249-
SI oCΛ
S) CΛ
I CΛ SI
s
SI CΛ
I CΛ
ON
SI CΛ -4
SI CΛ
00
)
SI
ON
SI
ON
)
ON CΛ
< < o o o F- < o o o
I
ON 00
s>
ON VO
SI -4
©
SI -4
SI
) -4
ON
s
-
SI -o4o
I -4
VO
SI
00 o
s>
00
)
00 SI
SI oo
00
SI 00 CΛ
I oo
~4
SI
00 00
SI
00
SI
S)
VO )
SI
VO UI
SI
SI
NO CΛ
SI
SI
VO
S>
VO 00
I
VO VO
© o
UI
©
UoI
UI
©
UI
UoI
EQUIVALENTS
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described specifically herein. Such equivalents are intended to be encompassed in the scope of the claims.
Claims
1. A nucleic acid segment shown in column 7 of the Table, or a portion thereof which includes a polymorphic site, or the complement of the segment or portion thereof.
2. The nucleic acid segment of claim 1 that is DNA.
3. The nucleic acid segment of claim 1 that is RNA.
4. The segment of claim 1 that is less 'than 100 bases.
5. The segment of claim 1 that is less than 50 bases.
6. The segment of claim 1 that is less than 20 bases.
7. The segment of claim 1, wherein the polymorphic site is biallelic .
8. The segment of claim 1, wherein the polymorphic form occupying the polymorphic site is the reference base for the fragment listed in the Table, column 3.
9. The segment of claim 1, wherein the polymorphic form occupying the polymorphic site is an alternative form for the fragment listed in the Table, column 4.
10. An allele-specific oligonucleotide that hybridizes to a segment of a fragment shown in the Table, column 7 or its complement.
11. The allele-specific oligonucleotide of claim 10 that is a probe .
12. The allele-specific oligonucleotide of claim 10, wherein a central position of the probe aligns with the polymorphic site of the fragment .
13. The allele-specific oligonucleotide of claim 10 that is a primer.
14. The allele-specific oligonucleotide of claim 13, wherein the 3' end of the primer aligns with the polymorphic site of the fragment .
15. The allele-specific oligonucleotide of Claim 10, which is selected from the group consisting of the nucleotide sequences of the Table, column 5.
16. The allele-specific oligonucleotide of Claim 10, which is selected from the group consisting of the nucleotide sequences of the Table, column 6.
17. An isolated nucleic acid comprising a sequence of the Table, column 7 or the complement thereof, wherein the polymorphic site within the sequence or complement is occupied by a base other than the reference base shown in the Table, column 3.
18. A method of analyzing a nucleic acid, comprising obtaining the nucleic acid from an individual; and determining a base occupying any one of the polymorphic sites shown in the Table.
19. The method of claim 18, wherein the determining comprises determining a set of bases occupying a set of the polymorphic sites shown in the Table.
0. The method of claim 18, wherein the nucleic acid is obtained from a plurality of individuals, and a base occupying one of the polymorphic positions is determined in each of the individuals, and the method further comprising testing each individual for the presence of a disease phenotype, and correlating the presence of the disease phenotype with the base .
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US3045596P | 1996-11-06 | 1996-11-06 | |
US30455P | 1996-11-06 | ||
PCT/US1997/020313 WO1998020165A2 (en) | 1996-11-06 | 1997-11-05 | Biallelic markers |
Publications (1)
Publication Number | Publication Date |
---|---|
EP0941366A2 true EP0941366A2 (en) | 1999-09-15 |
Family
ID=21854280
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP97946582A Withdrawn EP0941366A2 (en) | 1996-11-06 | 1997-11-05 | Biallelic markers |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP0941366A2 (en) |
WO (1) | WO1998020165A2 (en) |
Families Citing this family (73)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6391550B1 (en) | 1996-09-19 | 2002-05-21 | Affymetrix, Inc. | Identification of molecular sequence signatures and methods involving the same |
US6759515B1 (en) | 1997-02-25 | 2004-07-06 | Corixa Corporation | Compositions and methods for the therapy and diagnosis of prostate cancer |
US6277977B1 (en) * | 1997-06-11 | 2001-08-21 | Smithkline Beecham Corporation | cDNA clone HAPOI67 that encodes a human 7-transmembrane receptor |
US7105353B2 (en) | 1997-07-18 | 2006-09-12 | Serono Genetics Institute S.A. | Methods of identifying individuals for inclusion in drug studies |
FR2767135B1 (en) | 1997-08-06 | 2002-07-12 | Genset Sa | LSR COMPLEX RECEPTOR, ACTIVITY, CLONING, AND APPLICATION TO DIAGNOSIS, PREVENTION AND / OR TREATMENT OF OBESITY AND THE RISKS OR COMPLICATIONS THEREOF |
US6849719B2 (en) | 1997-09-17 | 2005-02-01 | Human Genome Sciences, Inc. | Antibody to an IL-17 receptor like protein |
WO1999014240A1 (en) | 1997-09-17 | 1999-03-25 | Human Genome Sciences, Inc. | Interleukin-17 receptor-like protein |
US6482923B1 (en) | 1997-09-17 | 2002-11-19 | Human Genome Sciences, Inc. | Interleukin 17-like receptor protein |
ATE513042T1 (en) | 1998-02-05 | 2011-07-15 | Glaxosmithkline Biolog Sa | TUMOR-ASSOCIATED ANTIGEN DERIVATIVES OF THE MAGE FAMILY,NUCLIC ACID SEQUENCES ENCODING SAME FOR PRODUCING FUSION PROTEINS AND VACCINATION COMPOSITIONS |
AU2577699A (en) | 1998-02-06 | 1999-08-23 | Human Genome Sciences, Inc. | Human serine protease and serpin polypeptides |
US6692909B1 (en) * | 1998-04-01 | 2004-02-17 | Whitehead Institute For Biomedical Research | Coding sequence polymorphisms in vascular pathology genes |
CA2324869A1 (en) * | 1998-04-09 | 1999-10-21 | Whitehead Institute For Biomedical Research | Biallelic markers |
WO1999054500A2 (en) * | 1998-04-21 | 1999-10-28 | Genset | Biallelic markers for use in constructing a high density disequilibrium map of the human genome |
US6537751B1 (en) | 1998-04-21 | 2003-03-25 | Genset S.A. | Biallelic markers for use in constructing a high density disequilibrium map of the human genome |
US20020192751A1 (en) | 1998-05-15 | 2002-12-19 | Genentech, Inc. | Secreted and transmembrane polypeptides and nucleic acids encoding the same |
US6251592B1 (en) * | 1998-05-26 | 2001-06-26 | Procrea Biosciences Inc. | STR marker system for DNA fingerprinting |
US6759192B1 (en) | 1998-06-05 | 2004-07-06 | Genset S.A. | Polymorphic markers of prostate carcinoma tumor antigen-1(PCTA-1) |
CA2328500A1 (en) * | 1998-06-05 | 1999-12-16 | Genset | Polymorphic markers of prostate carcinoma tumor antigen-1 (pcta-1) |
US6825004B1 (en) | 1998-08-07 | 2004-11-30 | Genset S.A. | Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof |
CA2337694A1 (en) * | 1998-08-07 | 2000-02-17 | Genset S.A. | Nucleic acids encoding human tbc-1 protein and polymorphic markers thereof |
US6703228B1 (en) | 1998-09-25 | 2004-03-09 | Massachusetts Institute Of Technology | Methods and products related to genotyping and DNA analysis |
DE69936379T2 (en) * | 1998-09-25 | 2008-02-28 | Massachusetts Institute Of Technology, Cambridge | METHOD FOR GENOTYPIZING AND DNA ANALYSIS |
US7067627B2 (en) | 1999-03-30 | 2006-06-27 | Serono Genetics Institute S.A. | Schizophrenia associated genes, proteins and biallelic markers |
US6476208B1 (en) | 1998-10-13 | 2002-11-05 | Genset | Schizophrenia associated genes, proteins and biallelic markers |
US6902892B1 (en) | 1998-10-19 | 2005-06-07 | Diadexus, Inc. | Method of diagnosing, monitoring, staging, imaging and treating prostate cancer |
WO2000024939A1 (en) | 1998-10-27 | 2000-05-04 | Affymetrix, Inc. | Complexity management and analysis of genomic dna |
JP2002528118A (en) | 1998-11-04 | 2002-09-03 | ジェンセット | Genomic and total cDNA sequences of APM1 specific to human adipocytes and their biallelic markers |
DE69920032T2 (en) * | 1998-11-10 | 2005-09-15 | Genset | METHODS, SOFTWARE AND APPARATUS FOR IDENTIFYING GENOMIC AREAS CONTAINING A GENE ASSOCIATED WITH A DETECTABLE CHARACTERISTIC |
US6670464B1 (en) * | 1998-11-17 | 2003-12-30 | Curagen Corporation | Nucleic acids containing single nucleotide polymorphisms and methods of use thereof |
US8367322B2 (en) | 1999-01-06 | 2013-02-05 | Cornell Research Foundation, Inc. | Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing |
EP1141384A2 (en) | 1999-01-06 | 2001-10-10 | Cornell Research Foundation, Inc. | Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing |
CA2359132A1 (en) | 1999-01-15 | 2000-07-20 | Roxanne D. Duan | Bone marrow-specific protein |
CA2359757A1 (en) | 1999-02-10 | 2000-08-17 | Genset S.A. | Polymorphic markers of the lsr gene |
US8133734B2 (en) | 1999-03-16 | 2012-03-13 | Human Genome Sciences, Inc. | Kit comprising an antibody to interleukin 17 receptor-like protein |
WO2000055375A1 (en) * | 1999-03-17 | 2000-09-21 | Alphagene, Inc. | Secreted proteins and polynucleotides encoding them |
AU780836B2 (en) | 1999-03-24 | 2005-04-21 | Serono Genetics Institute S.A. | Genomic sequence of the (purH) gene and (purH)-related biallelic markers |
EP1165836A2 (en) * | 1999-03-30 | 2002-01-02 | Genset | Schizophrenia associated genes, proteins and biallelic markers |
AU4050000A (en) * | 1999-03-31 | 2000-10-16 | Affymetrix, Inc. | Charaterization of single nucleotide polymorphisms in coding regions of human genes |
IL129734A0 (en) * | 1999-05-03 | 2000-02-29 | Compugen Ltd | Novel nucleic acid and amino acid sequences |
MXPA01011882A (en) * | 1999-05-25 | 2002-05-06 | Aventis Pharma Sa | Expression products of genes involved in diseases related to cholesterol metabolism. |
FR2794131B1 (en) * | 1999-05-25 | 2003-12-12 | Aventis Pharma Sa | GENE EXPRESSION PRODUCTS INVOLVED IN CONDITIONS ASSOCIATED WITH THE METABOLISM OF CHOLESTEROL |
AU781437B2 (en) * | 1999-06-25 | 2005-05-26 | Serono Genetics Institute S.A. | A novel BAP28 gene and protein |
EP1088900A1 (en) * | 1999-09-10 | 2001-04-04 | Epidauros Biotechnologie AG | Polymorphisms in the human CYP3A4, CYP3A7 and hPXR genes and their use in diagnostic and therapeutic applications |
US6555316B1 (en) | 1999-10-12 | 2003-04-29 | Genset S.A. | Schizophrenia associated gene, proteins and biallelic markers |
US6902890B1 (en) | 1999-11-04 | 2005-06-07 | Diadexus, Inc. | Method of diagnosing monitoring, staging, imaging and treating cancer |
AU1928801A (en) * | 1999-11-24 | 2001-06-04 | Curagen Corporation | Nucleic acids containing single nucleotide polymorphisms and methods of use thereof |
US6869762B1 (en) | 1999-12-10 | 2005-03-22 | Whitehead Institute For Biomedical Research | Crohn's disease-related polymorphisms |
AU2258601A (en) * | 1999-12-10 | 2001-06-18 | Ellipsis Biotherapeutics Corporation | Ibd-related polymorphisms |
EP1287013A2 (en) * | 1999-12-27 | 2003-03-05 | Curagen Corporation | Nucleic acids containing single nucleotide polymorphisms and methods of use thereof |
CA2395786A1 (en) * | 1999-12-27 | 2001-07-05 | Curagen Corporation | Nucleic acids containing single nucleotide polymorphisms and methods of use thereof |
CA2395926A1 (en) * | 1999-12-28 | 2001-07-05 | Curagen Corporation | Nucleic acids containing single nucleotide polymorphisms and methods of use thereof |
AU2630801A (en) * | 2000-01-07 | 2001-07-24 | Curagen Corporation | Nucleic acids containing single nucleotide polymorphisms and methods of use thereof |
US6989367B2 (en) | 2000-01-14 | 2006-01-24 | Genset S.A. | OBG3 globular head and uses thereof |
US20020058617A1 (en) | 2000-01-14 | 2002-05-16 | Joachim Fruebis | OBG3 globular head and uses thereof for decreasing body mass |
US7338787B2 (en) | 2000-01-14 | 2008-03-04 | Serono Genetics Institute S.A. | Nucleic acids encoding OBG3 globular head and uses thereof |
US6566332B2 (en) | 2000-01-14 | 2003-05-20 | Genset S.A. | OBG3 globular head and uses thereof for decreasing body mass |
US20020032319A1 (en) * | 2000-03-07 | 2002-03-14 | Whitehead Institute For Biomedical Research | Human single nucleotide polymorphisms |
AU2001253487A1 (en) * | 2000-04-13 | 2001-10-30 | Millennium Pharmaceuticals, Inc. | 23155 novel protein human 5-alpha reductases and uses therefor |
GB0016169D0 (en) * | 2000-06-30 | 2000-08-23 | Univ London | Diagnostic method |
AU2001235895B2 (en) * | 2001-02-20 | 2008-01-03 | Serono Genetics Institute S.A. | PG-3 and biallelic markers thereof |
FR2824333B1 (en) * | 2001-05-03 | 2003-08-08 | Genodyssee | NOVEL POLYNUCLEOTIDES AND POLYPEPTIDES OF IFN ALPHA 5 |
DE10122847A1 (en) * | 2001-05-11 | 2002-11-21 | Noxxon Pharma Ag | New nucleic acid that binds to staphylococcal enterotoxin B, useful for treating and diagnosing e.g. septic shock, identified by the SELEX method |
JP4336877B2 (en) * | 2003-04-18 | 2009-09-30 | アークレイ株式会社 | Method for detecting β3 adrenergic receptor mutant gene and nucleic acid probe and kit therefor |
US20090148458A1 (en) * | 2005-06-23 | 2009-06-11 | The University Of British Columbia | Coagulation factor iii polymorphisms associated with prediction of subject outcome and response to therapy |
WO2007035600A2 (en) * | 2005-09-16 | 2007-03-29 | Mayo Foundation For Education And Research | Natriuretic activities |
US9388457B2 (en) | 2007-09-14 | 2016-07-12 | Affymetrix, Inc. | Locus specific amplification using array probes |
US9074244B2 (en) | 2008-03-11 | 2015-07-07 | Affymetrix, Inc. | Array-based translocation and rearrangement assays |
WO2009143576A1 (en) * | 2008-05-27 | 2009-12-03 | Adelaide Research & Innovation Pty Ltd | Polymorphisms associated with pregnancy complications |
US20120108514A1 (en) | 2009-07-09 | 2012-05-03 | University Of Iowa Research Foundation | Long acting atrial natriuretic peptide (la-anp) and methods for use thereof |
WO2011151405A1 (en) | 2010-06-04 | 2011-12-08 | Institut National De La Sante Et De La Recherche Medicale (Inserm) | Constitutively active prolactin receptor variants as prognostic markers and therapeutic targets to prevent progression of hormone-dependent cancers towards hormone-independence |
EP2751136B1 (en) | 2011-08-30 | 2017-10-18 | Mayo Foundation For Medical Education And Research | Natriuretic polypeptides |
US9611305B2 (en) | 2012-01-06 | 2017-04-04 | Mayo Foundation For Medical Education And Research | Treating cardiovascular or renal diseases |
CN111139301B (en) * | 2020-03-10 | 2020-12-18 | 无锡市第五人民医院 | Breast cancer related gene ERBB2 site g.39397319C > A mutant and application thereof |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0726905B1 (en) * | 1993-11-03 | 2005-03-23 | Orchid BioSciences, Inc. | Single nucleotide polymorphisms and their use in genetic analysis |
FR2722295B1 (en) * | 1994-07-07 | 1996-10-04 | Roussy Inst Gustave | METHOD OF ANALYSIS OF SADDLE DNA AND ELECTRO-PHORENE GEL |
-
1997
- 1997-11-05 EP EP97946582A patent/EP0941366A2/en not_active Withdrawn
- 1997-11-05 WO PCT/US1997/020313 patent/WO1998020165A2/en not_active Application Discontinuation
Non-Patent Citations (1)
Title |
---|
See references of WO9820165A2 * |
Also Published As
Publication number | Publication date |
---|---|
WO1998020165A2 (en) | 1998-05-14 |
WO1998020165A3 (en) | 1998-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5856104A (en) | Polymorphisms in the glucose-6 phosphate dehydrogenase locus | |
WO1998020165A2 (en) | Biallelic markers | |
US6525185B1 (en) | Polymorphisms associated with hypertension | |
US20060263807A1 (en) | Methods for polymorphism identification and profiling | |
US6869762B1 (en) | Crohn's disease-related polymorphisms | |
US20060188875A1 (en) | Human genomic polymorphisms | |
WO1998038846A2 (en) | Genetic compositions and methods | |
US20020037508A1 (en) | Human single nucleotide polymorphisms | |
WO2001066800A2 (en) | Human single nucleotide polymorphisms | |
EP0812922A2 (en) | Polymorphisms in human mitochondrial nucleic acid | |
WO1999050454A2 (en) | Coding sequence polymorphisms in vascular pathology genes | |
WO2001018250A2 (en) | Single nucleotide polymorphisms in genes | |
WO1998024796A1 (en) | Brassica polymorphisms | |
EP1068354A2 (en) | Biallelic markers | |
WO1998058529A2 (en) | Genetic compositions and methods | |
US20030039973A1 (en) | Human single nucleotide polymorphisms | |
US20030054381A1 (en) | Genetic polymorphisms in the human neurokinin 1 receptor gene and their uses in diagnosis and treatment of diseases | |
EP1024200A2 (en) | Genetic compositions and methods | |
WO2001038576A2 (en) | Human single nucleotide polymorphisms | |
EP1276899A2 (en) | Ibd-related polymorphisms | |
WO1999014228A1 (en) | Genetic compositions and methods | |
WO2000058519A2 (en) | Charaterization of single nucleotide polymorphisms in coding regions of human genes | |
WO2001034840A2 (en) | Genetic compositions and methods | |
US20020155446A1 (en) | Very low density lipoprotein receptor polymorphisms and uses therefor | |
US20030008301A1 (en) | Association between schizophrenia and a two-marker haplotype near PILB gene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
17P | Request for examination filed |
Effective date: 19990604 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20030531 |