WO2022132198A2 - Compositions and methods for improved in vitro assembly of polynucleotides - Google Patents
Compositions and methods for improved in vitro assembly of polynucleotides Download PDFInfo
- Publication number
- WO2022132198A2 WO2022132198A2 PCT/US2021/010063 US2021010063W WO2022132198A2 WO 2022132198 A2 WO2022132198 A2 WO 2022132198A2 US 2021010063 W US2021010063 W US 2021010063W WO 2022132198 A2 WO2022132198 A2 WO 2022132198A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- overhangs
- dna
- ligase
- ligation
- fragments
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 133
- 102000040430 polynucleotide Human genes 0.000 title claims description 131
- 108091033319 polynucleotide Proteins 0.000 title claims description 131
- 239000002157 polynucleotide Substances 0.000 title claims description 131
- 238000000338 in vitro Methods 0.000 title claims description 11
- 239000000203 mixture Substances 0.000 title abstract description 18
- 239000012634 fragment Substances 0.000 claims abstract description 356
- 102000003960 Ligases Human genes 0.000 claims abstract description 201
- 108090000364 Ligases Proteins 0.000 claims abstract description 201
- 238000006243 chemical reaction Methods 0.000 claims abstract description 110
- 108091008146 restriction endonucleases Proteins 0.000 claims abstract description 68
- 108091034117 Oligonucleotide Proteins 0.000 claims abstract description 61
- 239000012190 activator Substances 0.000 claims abstract description 47
- 238000003776 cleavage reaction Methods 0.000 claims abstract description 47
- 230000007017 scission Effects 0.000 claims abstract description 47
- 125000003729 nucleotide group Chemical group 0.000 claims abstract description 43
- 239000002773 nucleotide Substances 0.000 claims abstract description 42
- 230000003612 virological effect Effects 0.000 claims abstract description 27
- 108010061982 DNA Ligases Proteins 0.000 claims description 198
- 102000012410 DNA Ligases Human genes 0.000 claims description 198
- 108020004414 DNA Proteins 0.000 claims description 166
- 239000002202 Polyethylene glycol Substances 0.000 claims description 57
- 229920001223 polyethylene glycol Polymers 0.000 claims description 57
- 239000000758 substrate Substances 0.000 claims description 49
- 230000000295 complement effect Effects 0.000 claims description 48
- 102000004190 Enzymes Human genes 0.000 claims description 46
- 108090000790 Enzymes Proteins 0.000 claims description 46
- 108090000623 proteins and genes Proteins 0.000 claims description 37
- 239000013612 plasmid Substances 0.000 claims description 28
- 239000011541 reaction mixture Substances 0.000 claims description 27
- 238000013461 design Methods 0.000 claims description 25
- 108010042407 Endonucleases Proteins 0.000 claims description 22
- 230000000694 effects Effects 0.000 claims description 22
- 238000003752 polymerase chain reaction Methods 0.000 claims description 21
- 238000005304 joining Methods 0.000 claims description 20
- 210000004027 cell Anatomy 0.000 claims description 18
- 238000000137 annealing Methods 0.000 claims description 17
- 230000001351 cycling effect Effects 0.000 claims description 17
- 102100031780 Endonuclease Human genes 0.000 claims description 14
- 102000004169 proteins and genes Human genes 0.000 claims description 14
- 102000053602 DNA Human genes 0.000 claims description 13
- 150000003839 salts Chemical class 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 11
- 241000701245 Paramecium bursaria Chlorella virus 1 Species 0.000 claims description 10
- 210000001744 T-lymphocyte Anatomy 0.000 claims description 10
- 230000001580 bacterial effect Effects 0.000 claims description 9
- 239000003795 chemical substances by application Substances 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 9
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 8
- 230000003321 amplification Effects 0.000 claims description 8
- 230000037353 metabolic pathway Effects 0.000 claims description 8
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 8
- 230000002829 reductive effect Effects 0.000 claims description 8
- 230000002194 synthesizing effect Effects 0.000 claims description 8
- 102000011724 DNA Repair Enzymes Human genes 0.000 claims description 7
- 108010076525 DNA Repair Enzymes Proteins 0.000 claims description 7
- 230000027455 binding Effects 0.000 claims description 7
- 238000007702 DNA assembly Methods 0.000 claims description 6
- 241000700605 Viruses Species 0.000 claims description 6
- 239000000427 antigen Substances 0.000 claims description 6
- 108091007433 antigens Proteins 0.000 claims description 6
- 102000036639 antigens Human genes 0.000 claims description 6
- 230000002255 enzymatic effect Effects 0.000 claims description 6
- 238000003780 insertion Methods 0.000 claims description 6
- 230000037431 insertion Effects 0.000 claims description 6
- 230000035772 mutation Effects 0.000 claims description 6
- 210000004881 tumor cell Anatomy 0.000 claims description 6
- 229960005486 vaccine Drugs 0.000 claims description 6
- 229910019142 PO4 Inorganic materials 0.000 claims description 5
- 210000000349 chromosome Anatomy 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims description 5
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 claims description 5
- 239000010452 phosphate Substances 0.000 claims description 5
- 230000008439 repair process Effects 0.000 claims description 5
- 206010028980 Neoplasm Diseases 0.000 claims description 4
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical group O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 claims description 4
- 230000003115 biocidal effect Effects 0.000 claims description 4
- 230000014509 gene expression Effects 0.000 claims description 4
- 238000005259 measurement Methods 0.000 claims description 4
- 108091033409 CRISPR Proteins 0.000 claims description 3
- 125000002887 hydroxy group Chemical group [H]O* 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 108010019670 Chimeric Antigen Receptors Proteins 0.000 claims description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 claims description 2
- 210000002865 immune cell Anatomy 0.000 claims description 2
- 239000007787 solid Substances 0.000 claims description 2
- 125000003275 alpha amino acid group Chemical group 0.000 claims 2
- 241000894006 Bacteria Species 0.000 abstract description 9
- 241001397104 Dima Species 0.000 abstract 1
- 239000000047 product Substances 0.000 description 61
- 239000000872 buffer Substances 0.000 description 34
- 150000007523 nucleic acids Chemical class 0.000 description 28
- 238000011534 incubation Methods 0.000 description 24
- 102000039446 nucleic acids Human genes 0.000 description 23
- 108020004707 nucleic acids Proteins 0.000 description 23
- 238000000429 assembly Methods 0.000 description 19
- 230000000712 assembly Effects 0.000 description 19
- 238000003556 assay Methods 0.000 description 17
- 238000012163 sequencing technique Methods 0.000 description 17
- -1 e.g. Proteins 0.000 description 15
- 239000011159 matrix material Substances 0.000 description 14
- 239000013615 primer Substances 0.000 description 13
- 241000588724 Escherichia coli Species 0.000 description 12
- 108091093088 Amplicon Proteins 0.000 description 9
- 238000010367 cloning Methods 0.000 description 9
- 238000003786 synthesis reaction Methods 0.000 description 9
- 102000004533 Endonucleases Human genes 0.000 description 8
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 8
- 239000003153 chemical reaction reagent Substances 0.000 description 8
- 238000004590 computer program Methods 0.000 description 8
- 230000009466 transformation Effects 0.000 description 8
- JEOQACOXAOEPLX-WCCKRBBISA-N (2s)-2-amino-5-(diaminomethylideneamino)pentanoic acid;1,3-thiazolidine-4-carboxylic acid Chemical compound OC(=O)C1CSCN1.OC(=O)[C@@H](N)CCCN=C(N)N JEOQACOXAOEPLX-WCCKRBBISA-N 0.000 description 7
- VUFNLQXQSDUXKB-DOFZRALJSA-N 2-[4-[4-[bis(2-chloroethyl)amino]phenyl]butanoyloxy]ethyl (5z,8z,11z,14z)-icosa-5,8,11,14-tetraenoate Chemical compound CCCCC\C=C/C\C=C/C\C=C/C\C=C/CCCC(=O)OCCOC(=O)CCCC1=CC=C(N(CCCl)CCCl)C=C1 VUFNLQXQSDUXKB-DOFZRALJSA-N 0.000 description 7
- 102100039217 3-ketoacyl-CoA thiolase, peroxisomal Human genes 0.000 description 7
- 101100153048 Homo sapiens ACAA1 gene Proteins 0.000 description 7
- 108091028043 Nucleic acid sequence Proteins 0.000 description 7
- 238000003491 array Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 7
- 230000003068 static effect Effects 0.000 description 7
- 231100000331 toxic Toxicity 0.000 description 7
- 230000002588 toxic effect Effects 0.000 description 7
- YRIZYWQGELRKNT-UHFFFAOYSA-N 1,3,5-trichloro-1,3,5-triazinane-2,4,6-trione Chemical compound ClN1C(=O)N(Cl)C(=O)N(Cl)C1=O YRIZYWQGELRKNT-UHFFFAOYSA-N 0.000 description 6
- FSNCEEGOMTYXKY-JTQLQIEISA-N Lycoperodine 1 Natural products N1C2=CC=CC=C2C2=C1CN[C@H](C(=O)O)C2 FSNCEEGOMTYXKY-JTQLQIEISA-N 0.000 description 6
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical group O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 230000036961 partial effect Effects 0.000 description 6
- 239000011535 reaction buffer Substances 0.000 description 6
- 238000011160 research Methods 0.000 description 6
- YNJBWRMUSHSURL-UHFFFAOYSA-N trichloroacetic acid Chemical compound OC(=O)C(Cl)(Cl)Cl YNJBWRMUSHSURL-UHFFFAOYSA-N 0.000 description 6
- 241000711573 Coronaviridae Species 0.000 description 5
- 101000927847 Homo sapiens DNA ligase 3 Proteins 0.000 description 5
- 101000829958 Homo sapiens N-acetyllactosaminide beta-1,6-N-acetylglucosaminyl-transferase Proteins 0.000 description 5
- 102100023315 N-acetyllactosaminide beta-1,6-N-acetylglucosaminyl-transferase Human genes 0.000 description 5
- 102100030569 Nuclear receptor corepressor 2 Human genes 0.000 description 5
- 101710153660 Nuclear receptor corepressor 2 Proteins 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000005520 cutting process Methods 0.000 description 5
- 230000007423 decrease Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 5
- 102000046719 human LIG3 Human genes 0.000 description 5
- 238000009396 hybridization Methods 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000005580 one pot reaction Methods 0.000 description 5
- KDCGOANMDULRCW-UHFFFAOYSA-N 7H-purine Chemical compound N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 description 4
- 241000701867 Enterobacteria phage T7 Species 0.000 description 4
- 108091092584 GDNA Proteins 0.000 description 4
- 239000008118 PEG 6000 Substances 0.000 description 4
- 229920002584 Polyethylene Glycol 6000 Polymers 0.000 description 4
- WCDYMMVGBZNUGB-ORPFKJIMSA-N [(2r,3r,4s,5r,6r)-6-[[(1r,3r,4r,5r,6r)-4,5-dihydroxy-2,7-dioxabicyclo[4.2.0]octan-3-yl]oxy]-3,4,5-trihydroxyoxan-2-yl]methyl 3-hydroxy-2-tetradecyloctadecanoate Chemical compound O[C@@H]1[C@@H](O)[C@@H](O)[C@@H](COC(=O)C(CCCCCCCCCCCCCC)C(O)CCCCCCCCCCCCCCC)O[C@@H]1O[C@@H]1[C@H](O)[C@@H](O)[C@H]2OC[C@H]2O1 WCDYMMVGBZNUGB-ORPFKJIMSA-N 0.000 description 4
- 150000001413 amino acids Chemical class 0.000 description 4
- 230000003247 decreasing effect Effects 0.000 description 4
- 230000029087 digestion Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 239000002609 medium Substances 0.000 description 4
- 238000002844 melting Methods 0.000 description 4
- 230000008018 melting Effects 0.000 description 4
- SCVFZCLFOSHCOH-UHFFFAOYSA-M potassium acetate Chemical compound [K+].CC([O-])=O SCVFZCLFOSHCOH-UHFFFAOYSA-M 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 101150075675 tatC gene Proteins 0.000 description 4
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical group CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 4
- 108010052418 (N-(2-((4-((2-((4-(9-acridinylamino)phenyl)amino)-2-oxoethyl)amino)-4-oxobutyl)amino)-1-(1H-imidazol-4-ylmethyl)-1-oxoethyl)-6-(((-2-aminoethyl)amino)methyl)-2-pyridinecarboxamidato) iron(1+) Proteins 0.000 description 3
- BCOSEZGCLGPUSL-UHFFFAOYSA-N 2,3,3-trichloroprop-2-enoyl chloride Chemical compound ClC(Cl)=C(Cl)C(Cl)=O BCOSEZGCLGPUSL-UHFFFAOYSA-N 0.000 description 3
- 229920001817 Agar Polymers 0.000 description 3
- 102100024044 Aprataxin Human genes 0.000 description 3
- 101710105690 Aprataxin Proteins 0.000 description 3
- 241000721047 Danaus plexippus Species 0.000 description 3
- 241001123946 Gaga Species 0.000 description 3
- 102100040004 Gamma-glutamylcyclotransferase Human genes 0.000 description 3
- 102100036263 Glutamyl-tRNA(Gln) amidotransferase subunit C, mitochondrial Human genes 0.000 description 3
- 101000886680 Homo sapiens Gamma-glutamylcyclotransferase Proteins 0.000 description 3
- 101001001786 Homo sapiens Glutamyl-tRNA(Gln) amidotransferase subunit C, mitochondrial Proteins 0.000 description 3
- 101001128634 Homo sapiens NADH dehydrogenase [ubiquinone] 1 beta subcomplex subunit 2, mitochondrial Proteins 0.000 description 3
- 101000869690 Homo sapiens Protein S100-A8 Proteins 0.000 description 3
- 101000666730 Homo sapiens T-complex protein 1 subunit alpha Proteins 0.000 description 3
- 102100032194 NADH dehydrogenase [ubiquinone] 1 beta subcomplex subunit 2, mitochondrial Human genes 0.000 description 3
- 101710163270 Nuclease Proteins 0.000 description 3
- 102100032442 Protein S100-A8 Human genes 0.000 description 3
- 102100038410 T-complex protein 1 subunit alpha Human genes 0.000 description 3
- 108010006785 Taq Polymerase Proteins 0.000 description 3
- 239000008272 agar Substances 0.000 description 3
- 239000000470 constituent Substances 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 206010016256 fatigue Diseases 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- BPHPUYQFMNQIOC-NXRLNHOXSA-N isopropyl beta-D-thiogalactopyranoside Chemical compound CC(C)S[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O BPHPUYQFMNQIOC-NXRLNHOXSA-N 0.000 description 3
- CJWXCNXHAIFFMH-AVZHFPDBSA-N n-[(2s,3r,4s,5s,6r)-2-[(2r,3r,4s,5r)-2-acetamido-4,5,6-trihydroxy-1-oxohexan-3-yl]oxy-3,5-dihydroxy-6-methyloxan-4-yl]acetamide Chemical compound C[C@H]1O[C@@H](O[C@@H]([C@@H](O)[C@H](O)CO)[C@@H](NC(C)=O)C=O)[C@H](O)[C@@H](NC(C)=O)[C@@H]1O CJWXCNXHAIFFMH-AVZHFPDBSA-N 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000002441 reversible effect Effects 0.000 description 3
- 238000012552 review Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 230000037432 silent mutation Effects 0.000 description 3
- 125000006850 spacer group Chemical group 0.000 description 3
- 230000009897 systematic effect Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 229940035893 uracil Drugs 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- FMKJUUQOYOHLTF-OWOJBTEDSA-N (e)-4-azaniumylbut-2-enoate Chemical compound NC\C=C\C(O)=O FMKJUUQOYOHLTF-OWOJBTEDSA-N 0.000 description 2
- JKMPXGJJRMOELF-UHFFFAOYSA-N 1,3-thiazole-2,4,5-tricarboxylic acid Chemical compound OC(=O)C1=NC(C(O)=O)=C(C(O)=O)S1 JKMPXGJJRMOELF-UHFFFAOYSA-N 0.000 description 2
- JTTIOYHBNXDJOD-UHFFFAOYSA-N 2,4,6-triaminopyrimidine Chemical compound NC1=CC(N)=NC(N)=N1 JTTIOYHBNXDJOD-UHFFFAOYSA-N 0.000 description 2
- OPIFSICVWOWJMJ-AEOCFKNESA-N 5-bromo-4-chloro-3-indolyl beta-D-galactoside Chemical compound O[C@@H]1[C@@H](O)[C@@H](O)[C@@H](CO)O[C@H]1OC1=CNC2=CC=C(Br)C(Cl)=C12 OPIFSICVWOWJMJ-AEOCFKNESA-N 0.000 description 2
- 102100039819 Actin, alpha cardiac muscle 1 Human genes 0.000 description 2
- 241000726103 Atta Species 0.000 description 2
- 102100025570 Cancer/testis antigen 1 Human genes 0.000 description 2
- 108010008758 Chlorella virus DNA ligase Proteins 0.000 description 2
- 108010008286 DNA nucleotidylexotransferase Proteins 0.000 description 2
- 230000006820 DNA synthesis Effects 0.000 description 2
- 230000004568 DNA-binding Effects 0.000 description 2
- 102100029764 DNA-directed DNA/RNA polymerase mu Human genes 0.000 description 2
- 108010067770 Endopeptidase K Proteins 0.000 description 2
- 108060002716 Exonuclease Proteins 0.000 description 2
- OOFLZRMKTMLSMH-UHFFFAOYSA-N H4atta Chemical compound OC(=O)CN(CC(O)=O)CC1=CC=CC(C=2N=C(C=C(C=2)C=2C3=CC=CC=C3C=C3C=CC=CC3=2)C=2N=C(CN(CC(O)=O)CC(O)=O)C=CC=2)=N1 OOFLZRMKTMLSMH-UHFFFAOYSA-N 0.000 description 2
- 101000959247 Homo sapiens Actin, alpha cardiac muscle 1 Proteins 0.000 description 2
- 101000856237 Homo sapiens Cancer/testis antigen 1 Proteins 0.000 description 2
- 101000724418 Homo sapiens Neutral amino acid transporter B(0) Proteins 0.000 description 2
- 239000006137 Luria-Bertani broth Substances 0.000 description 2
- CSNNHWWHGAXBCP-UHFFFAOYSA-L Magnesium sulfate Chemical compound [Mg+2].[O-][S+2]([O-])([O-])[O-] CSNNHWWHGAXBCP-UHFFFAOYSA-L 0.000 description 2
- 102100028267 Neutral amino acid transporter B(0) Human genes 0.000 description 2
- 102000014076 Nucleotidyl transferase domains Human genes 0.000 description 2
- 108050003811 Nucleotidyl transferase domains Proteins 0.000 description 2
- 229920002562 Polyethylene Glycol 3350 Polymers 0.000 description 2
- 102100029812 Protein S100-A12 Human genes 0.000 description 2
- 101710110949 Protein S100-A12 Proteins 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 230000033228 biological regulation Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000007795 chemical reaction product Substances 0.000 description 2
- 229960005091 chloramphenicol Drugs 0.000 description 2
- WIIZWVCIJKGZOK-RKDXNWHRSA-N chloramphenicol Chemical compound ClC(Cl)C(=O)N[C@H](CO)[C@H](O)C1=CC=C([N+]([O-])=O)C=C1 WIIZWVCIJKGZOK-RKDXNWHRSA-N 0.000 description 2
- 230000005757 colony formation Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 210000003527 eukaryotic cell Anatomy 0.000 description 2
- 102000013165 exonuclease Human genes 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 208000015181 infectious disease Diseases 0.000 description 2
- 230000002401 inhibitory effect Effects 0.000 description 2
- 238000002032 lab-on-a-chip Methods 0.000 description 2
- 238000007169 ligase reaction Methods 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 239000007788 liquid Substances 0.000 description 2
- 238000007403 mPCR Methods 0.000 description 2
- UEGPKNKPLBYCNK-UHFFFAOYSA-L magnesium acetate Chemical compound [Mg+2].CC([O-])=O.CC([O-])=O UEGPKNKPLBYCNK-UHFFFAOYSA-L 0.000 description 2
- 239000011654 magnesium acetate Substances 0.000 description 2
- 229940069446 magnesium acetate Drugs 0.000 description 2
- 235000011285 magnesium acetate Nutrition 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 235000011056 potassium acetate Nutrition 0.000 description 2
- 239000002243 precursor Substances 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 238000010845 search algorithm Methods 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 229940113082 thymine Drugs 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 238000001890 transfection Methods 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- PIEPQKCYPFFYMG-UHFFFAOYSA-N tris acetate Chemical compound CC(O)=O.OCC(N)(CO)CO PIEPQKCYPFFYMG-UHFFFAOYSA-N 0.000 description 2
- BAAVRTJSLCSMNM-CMOCDZPBSA-N (2s)-2-[[(2s)-2-[[(2s)-2-[[(2s)-2-amino-3-(4-hydroxyphenyl)propanoyl]amino]-4-carboxybutanoyl]amino]-3-(4-hydroxyphenyl)propanoyl]amino]pentanedioic acid Chemical compound C([C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](CCC(O)=O)C(O)=O)C1=CC=C(O)C=C1 BAAVRTJSLCSMNM-CMOCDZPBSA-N 0.000 description 1
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- JEPVUMTVFPQKQE-AAKCMJRZSA-N 2-[(1s,2s,3r,4s)-1,2,3,4,5-pentahydroxypentyl]-1,3-thiazolidine-4-carboxylic acid Chemical compound OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)C1NC(C(O)=O)CS1 JEPVUMTVFPQKQE-AAKCMJRZSA-N 0.000 description 1
- ASJSAQIRZKANQN-CRCLSJGQSA-N 2-deoxy-D-ribose Chemical compound OC[C@@H](O)[C@@H](O)CC=O ASJSAQIRZKANQN-CRCLSJGQSA-N 0.000 description 1
- FVFVNNKYKYZTJU-UHFFFAOYSA-N 6-chloro-1,3,5-triazine-2,4-diamine Chemical compound NC1=NC(N)=NC(Cl)=N1 FVFVNNKYKYZTJU-UHFFFAOYSA-N 0.000 description 1
- PWJFNRJRHXWEPT-UHFFFAOYSA-N ADP ribose Natural products C1=NC=2C(N)=NC=NC=2N1C1OC(COP(O)(=O)OP(O)(=O)OCC(O)C(O)C(O)C=O)C(O)C1O PWJFNRJRHXWEPT-UHFFFAOYSA-N 0.000 description 1
- SRNWOUGRCWSEMX-KEOHHSTQSA-N ADP-beta-D-ribose Chemical group C([C@H]1O[C@H]([C@@H]([C@@H]1O)O)N1C=2N=CN=C(C=2N=C1)N)OP(O)(=O)OP(O)(=O)OC[C@H]1O[C@@H](O)[C@H](O)[C@@H]1O SRNWOUGRCWSEMX-KEOHHSTQSA-N 0.000 description 1
- 241000023308 Acca Species 0.000 description 1
- 229930024421 Adenine Natural products 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 244000058084 Aegle marmelos Species 0.000 description 1
- 235000003930 Aegle marmelos Nutrition 0.000 description 1
- 108010088751 Albumins Proteins 0.000 description 1
- 102000009027 Albumins Human genes 0.000 description 1
- 102100022524 Alpha-1-antichymotrypsin Human genes 0.000 description 1
- 108091023037 Aptamer Proteins 0.000 description 1
- 241001083841 Aquatica Species 0.000 description 1
- 101000651036 Arabidopsis thaliana Galactolipid galactosyltransferase SFR2, chloroplastic Proteins 0.000 description 1
- 244000063299 Bacillus subtilis Species 0.000 description 1
- 235000014469 Bacillus subtilis Nutrition 0.000 description 1
- 208000035143 Bacterial infection Diseases 0.000 description 1
- 101100263837 Bovine ephemeral fever virus (strain BB7721) beta gene Proteins 0.000 description 1
- 238000011357 CAR T-cell therapy Methods 0.000 description 1
- 241001678559 COVID-19 virus Species 0.000 description 1
- 108010077544 Chromatin Proteins 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 108020004635 Complementary DNA Proteins 0.000 description 1
- 108091035707 Consensus sequence Proteins 0.000 description 1
- 108010060248 DNA Ligase ATP Proteins 0.000 description 1
- 102000008158 DNA Ligase ATP Human genes 0.000 description 1
- 101710156804 DNA ligase A Proteins 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 230000008265 DNA repair mechanism Effects 0.000 description 1
- 230000007018 DNA scission Effects 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 101710180995 Endonuclease 1 Proteins 0.000 description 1
- 101100316840 Enterobacteria phage P4 Beta gene Proteins 0.000 description 1
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 102100040870 Glycine amidinotransferase, mitochondrial Human genes 0.000 description 1
- 241000856850 Goose coronavirus Species 0.000 description 1
- UYTPUPDQBNUYGX-UHFFFAOYSA-N Guanine Natural products O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 1
- 108020005004 Guide RNA Proteins 0.000 description 1
- 101000678026 Homo sapiens Alpha-1-antichymotrypsin Proteins 0.000 description 1
- 101000893303 Homo sapiens Glycine amidinotransferase, mitochondrial Proteins 0.000 description 1
- 101000856513 Homo sapiens Inactive N-acetyllactosaminide alpha-1,3-galactosyltransferase Proteins 0.000 description 1
- 101000804764 Homo sapiens Lymphotactin Proteins 0.000 description 1
- 101000957437 Homo sapiens Mitochondrial carnitine/acylcarnitine carrier protein Proteins 0.000 description 1
- 102100025509 Inactive N-acetyllactosaminide alpha-1,3-galactosyltransferase Human genes 0.000 description 1
- 102100035304 Lymphotactin Human genes 0.000 description 1
- 102100038738 Mitochondrial carnitine/acylcarnitine carrier protein Human genes 0.000 description 1
- 101150101095 Mmp12 gene Proteins 0.000 description 1
- 108010006519 Molecular Chaperones Proteins 0.000 description 1
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- BAWFJGJZGIEFAR-NNYOXOHSSA-O NAD(+) Chemical compound NC(=O)C1=CC=C[N+]([C@H]2[C@@H]([C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OC[C@@H]3[C@H]([C@@H](O)[C@@H](O3)N3C4=NC=NC(N)=C4N=C3)O)O2)O)=C1 BAWFJGJZGIEFAR-NNYOXOHSSA-O 0.000 description 1
- 108091061960 Naked DNA Proteins 0.000 description 1
- 108700026244 Open Reading Frames Proteins 0.000 description 1
- 108091081548 Palindromic sequence Proteins 0.000 description 1
- 241001318097 Paucibacter Species 0.000 description 1
- 229920002582 Polyethylene Glycol 600 Polymers 0.000 description 1
- 101000619947 Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) DNA repair polymerase Proteins 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 201000008754 Tenosynovial giant cell tumor Diseases 0.000 description 1
- 241000589499 Thermus thermophilus Species 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 229920004890 Triton X-100 Polymers 0.000 description 1
- 239000013504 Triton X-100 Substances 0.000 description 1
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 1
- YRKCREAYFQTBPV-UHFFFAOYSA-N acetylacetone Chemical compound CC(=O)CC(C)=O YRKCREAYFQTBPV-UHFFFAOYSA-N 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- BFNBIHQBYMNNAN-UHFFFAOYSA-N ammonium sulfate Chemical compound N.N.OS(O)(=O)=O BFNBIHQBYMNNAN-UHFFFAOYSA-N 0.000 description 1
- 229910052921 ammonium sulfate Inorganic materials 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 208000022362 bacterial infectious disease Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000003197 catalytic effect Effects 0.000 description 1
- 238000002659 cell therapy Methods 0.000 description 1
- 108091092356 cellular DNA Proteins 0.000 description 1
- 210000003483 chromatin Anatomy 0.000 description 1
- 238000009643 clonogenic assay Methods 0.000 description 1
- 231100000096 clonogenic assay Toxicity 0.000 description 1
- 230000001332 colony forming effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 238000010205 computational analysis Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 239000008121 dextrose Substances 0.000 description 1
- 208000035647 diffuse type tenosynovial giant cell tumor Diseases 0.000 description 1
- 230000003292 diminished effect Effects 0.000 description 1
- 230000034431 double-strand break repair via homologous recombination Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 241001493065 dsRNA viruses Species 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 108010052305 exodeoxyribonuclease III Proteins 0.000 description 1
- 238000001943 fluorescence-activated cell sorting Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000004034 genetic regulation Effects 0.000 description 1
- 238000010448 genetic screening Methods 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- IVSXFFJGASXYCL-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=NC=N[C]21 IVSXFFJGASXYCL-UHFFFAOYSA-N 0.000 description 1
- 238000002744 homologous recombination Methods 0.000 description 1
- 230000006801 homologous recombination Effects 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000000126 in silico method Methods 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 230000002458 infectious effect Effects 0.000 description 1
- 239000012212 insulator Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 150000002605 large molecules Chemical class 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 101150094164 lysY gene Proteins 0.000 description 1
- 125000003588 lysine group Chemical group [H]N([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 description 1
- 229920002521 macromolecule Polymers 0.000 description 1
- 229910052943 magnesium sulfate Inorganic materials 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000013028 medium composition Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 238000001823 molecular biology technique Methods 0.000 description 1
- 238000001668 nucleic acid synthesis Methods 0.000 description 1
- 239000002777 nucleoside Substances 0.000 description 1
- 150000003833 nucleoside derivatives Chemical class 0.000 description 1
- 238000002515 oligonucleotide synthesis Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- UEZVMMHDMIWARA-UHFFFAOYSA-M phosphonate Chemical compound [O-]P(=O)=O UEZVMMHDMIWARA-UHFFFAOYSA-M 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000010791 quenching Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 238000013341 scale-up Methods 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 238000002741 site-directed mutagenesis Methods 0.000 description 1
- 238000004513 sizing Methods 0.000 description 1
- 208000002918 testicular germ cell tumor Diseases 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 231100000419 toxicity Toxicity 0.000 description 1
- 230000001988 toxicity Effects 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 108010032276 tyrosyl-glutamyl-tyrosyl-glutamic acid Proteins 0.000 description 1
- 241001515965 unidentified phage Species 0.000 description 1
- 230000035899 viability Effects 0.000 description 1
- 210000002845 virion Anatomy 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 239000011701 zinc Substances 0.000 description 1
- 229910052725 zinc Inorganic materials 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
- C12N15/1031—Mutagenizing nucleic acids mutagenesis by gene assembly, e.g. assembly by oligonucleotide extension PCR
Definitions
- fragments of DNA are created using restriction endonucleases that generate single strand overhangs on double stranded DNA. Ligation then occurs between overhangs on multiple different fragments to assemble a single double stranded molecule from the fragments.
- Methods for identifying preferred overhangs for polynucleotide fragment assembly under specified criteria for a desired number of fragments have been described in WO 2020/081768.
- the selection of optimized overhangs using T4 DNA ligase were collated using computer software based on extensive sequencing of assembled fragments based on varying the sequence of overhangs. Other factors including temperature and time of incubation were varied and the consequences of these were integrated into the computational analysis.
- the systematic analysis of fidelity and efficiency of fragment assembly and the accessibility of the resulting data in a user friendly format were shown to facilitate the faithfully assembly of large numbers of fragments in a desired order in a time efficient manner.
- Type IIS restriction endonuclease may give rise to undesirable internal cleavage sites. These can be eliminated by site directed mutagenesis or by design of assembly junction points in the recognition sequence but these elimination strategies take time and increase cost. Internal sites significantly decrease assembly efficiency, as they allow the finished construct to be susceptible to digestion by the restriction enzyme present in the assembly reaction leading to incorrect and unwanted assemblies. Hence, it is desirable to have Type IIS endonucleases that recognize 7 nucleotides for cleavage. Such enzymes would be particularly useful for assembly of multi-fragments where the assembly is complex and maximal efficiency is desirable.
- an endonuclease that is capable of cutting to completion and has no detectable star activity is preferred.
- Aarl which is a Type IIS endonuclease with a 7 nucleotide recognition sequence.
- this endonuclease has star activity and does not cut DNA to completion.
- Neither DNA or protein sequence of Aarl or the buffer requirements are known so options to improve this enzyme are not available.
- T4 DNA ligase Another aspect of Golden Gate assembly methods is its reliance on a T4 ligase. Bias in ligating various complementary overhangs was detected with T4 DNA ligase (Potapov et al. ACS Synthetic Biology, 7, 2665-2674 (2016); Nilsson et al. Nucleic Acids Res. 10:1425-1437 (1982); Goffin et al. Nucleic Acids Res. 15:8755-8771 (1987); Wu et al. Gene, 76: 245-254 (1989); Harada et al. Nucleic Acids Res., 21, 2287-2291 (1993); Showalter et al. Chem Rev. 106: 340-360 (2006); Engler et al.
- a synthetic self-complementary oligonucleotide is provided that is characterized by a double-stranded region and a single strand loop, wherein the double-stranded region contains a recognition sequence for PaqCI® (New England Biolabs, Inc.), has unligatable 3' and 5' ends and cannot be cleaved by PaqCI.
- PaqCI is defined herein as including variants that have no more than 10% amino acid modifications compared to the wild type and retain DNA recognition specificities and cleavage properties.
- the oligonucleotide may be further defined by any one or more of the following features: the double-stranded region having a length of 10 - 50 base pairs; the length of the oligonucleotide less than 110 nucleotides; the 3' end of the oligonucleotide not a 3' hydroxyl; the 5' end of the oligonucleotide is not a 5' phosphate and/or the recognition sequence being CACCTGC; and occurring only once in the oligonucleotide.
- a reaction mixture includes, a synthetic self-complementary oligonucleotide described above and a PaqCI restriction endonuclease or a variant thereof having an amino acid sequence that has at least 90% amino acid sequence identity with SEQ ID NO:1, where PaqCI is defined herein as including variants that have no more than 10% amino acid modifications compared to the wild type and retain DNA recognition specificities and cleavage properties.
- the ratio of PaqCI to the synthetic self-complementary oligonucleotide is in the range of 1 unit PaqCI: 0.75 pmole to 9 pmole oligonucleotide; includes a double-stranded DNA substrate and/or a ligase; the DNA substrate contains one or more recognition sequences for PaqCI and can be cleaved by PaqCI to produce a 4- base overhang; the recognition sequence in the DNA substrate is CACCTGC; the DNA ligase selected from the group consisting of T4 DN A ligase, T3DN A ligase, T7 DNA ligase, PBCV-1 D A ligase and h Lig3; the ratio of the PaqCI to ligase is 2.5-20 PaqCI Units to 200-800 ligase units; the reaction mixture includes a plurality of plasmid or PCR products that contain fragments that are each flanked by binding sites for Paq
- a method includes the following steps: (a) obtaining a reaction mixture comprising: (i) a synthetic oligonucleotide as described above; (ii) PaqCI; (iii) a ligase; and (iv) a library of DNA substrates each having at least one PaqCI recognition sequence and a cleavage site; (b) cleaving the library of DNA substrates with PaqCI to generate fragments that have 4-base overhangs; and (c) ligating complementary 4-base overhangs together to produce an ordered assembly of the fragments.
- the method may include the following features: the DNA substrates in the library are selected from one or more of the group consisting of: a PCR products, plasmids, genomes or chromosomes; step (c) may further include ligating the ordered assembly into a destination vector or viral genome; the destination vector is a plasmid, or a chromosome; the ligase may be selected from the group consisting of: T4 DNA ligase, T3 DNA ligase, T7 DNA ligase, PBCV-1 and human ligase 3; there are 10-100 DNA substrates having unique sequences and the ordered assembly comprises 10-100 fragments that are ligated together in step (c); at least 20 DNA substrates having unique sequences are included in the reaction mixture and the ordered assembly comprises at least 20 fragments that are ligated together in step (c); and the reaction mixture may additionally include a DNA repair enzyme, for example EndoMS, a deadenylase, for example, yeast deadenylase, and /or a crowding agent for
- the method in step (a) may include: identifying a set of 4-base overhangs for the reaction mix using a computer tool wherein: (i) the computer tool generates from a data set; an optimized fidelity and/or frequency score for a set of 4-base overhangs for the library of DNA wherein the optimized fidelity and/or frequency score is derived from data on annealing of complementary sequences; and data from ligase activity for different 4- base overhangs; and/or (ii) the computer tool provides break points in an in silica sequence to generate fragment sequences for joining in an ordered assembly via optimized 4-base overhangs.
- a kit is provided that contains a synthetic self-complementary oligonucleotide as described above and PaqCI that encompasses variants as defined above.
- kits may include one or more of the following: a ligase; a cofactor selected from the group consisting of a repair enzyme a mismatch specific endonuclease such as EndoMS, deadenylase and a crowding agent such as polyethylene glycol (PEG) and has a molecular weight in the range of 600-8000; and instructions for synthesizing a large DNA from component fragments having 4-base overhangs.
- the reagents in the kit may be combined or in two or more containers.
- at least one of the oligonucleotides, ligase and PaqCI variants are freeze dried or immobilized on a solid substrate such as a two dimensional or a three dimensional surface.
- a computer implemented method for selecting a set of overhangs for an ordered assembly reaction performed under selected ligation conditions, that includes (a) receiving: (i) a desired number of overhangs for an assembly reaction and (ii) a length of the overhangs; (b) selecting a set of overhangs from an overhang table, wherein the selected set of overhangs has the desired number of overhangs received in (i) and the length of overhangs received in (ii); (c) selecting a ligase from a plurality of different ligases for ligating the overhangs with reduced bias; (d) for each individual overhang in the set, calculating a ligation fidelity score for the selected ligase, wherein the ligation fidelity score of each individual overhang represents the frequency at which the individual overhang and its complement independently ligate to a perfectly complementary overhang relative to all overhangs in the set and their complements; (e) calculating an overall ligation fidelity score for the set of overhangs based on the calculated
- One or more features of the computer implemented method include: that each of the individual overhangs in the set of overhangs selected in (b) is unique within the set, and is not complementary to another overhang in the set, and is not palindromic; calculating the ligation fidelity score in (c) further includes: consulting the ligation frequency table and bias table for different ligases comprising individual experimentally- defined measurements of the number of ligation events and /or mismatch events; calculating the number of ligation events and/or mismatch events that occur between each individual overhang and its complement relative to the total number of ligation events that occur between the individual overhang and all of the overhangs in the set and their complements and the complement of the individual overhang and all of the overhangs in the set and their complements; wherein the set of overhangs correspond to the individual overhangs on each end of a plurality of double stranded polynucleotide fragments for ordered assembly into a target polynucleotide, wherein the individual overhangs are single
- Another feature of the method may include: in (a) receiving (iv) a nucleotide sequence of an assembly; and (v) a set of intervals in which the nucleotide sequence of (iv) can be enzymatically cleaved and identifying a non-redundant set of sub-sequences in the intervals that are the same length as the overhang length input in (ii), where each sub-sequence has an overhang; and the method further comprises: (h) storing the non- redundant set of sub-sequences having the set of overhangs with a suitable overall fidelity score.
- Another feature may include defining each interval of (v) by beginning and end coordinates in the nucleotide sequence of the assembly.
- Another feature may include: in (e) iterating (b)-(d) at least 1000 times.
- Another feature may include: in (a) receiving the selected experimental conditions for enzymatic cleavage and ligation for ordered assembly of the polynucleotide fragments.
- Another feature may include: receiving the selected experimental conditions for providing the set of overhangs in (g) having a suitable fidelity and/or frequency score for annealing and for ligation with a selected ligase.
- Another feature may include causing the computer implemented method as described above to be executed and receiving an output containing the set of overhangs as identified in (g), and/or if (iv) and (v) are input, then receiving sequences for a set of polynucleotide fragments for ordered assembly, where the ends of the fragments are defined by the overhangs identified in (g).
- the computer implemented method may include obtaining sequences for a set of polynucleotide fragments having the identified non-redundant set of sub-sequences in the intervals that can be enzymatically cleaved to produce the identified overhangs. Another feature of the method may include establishing that the selected experimental conditions and the computer-generated set of overhangs are suitable for ordered assembly of a selected set of polynucleotide fragments with an effective amount of fidelity and frequency of complementary annealing and ligase dependent ligation for the number of fragments in the set.
- the experimental conditions may include selecting a DNA ligase, having a suitable fidelity and frequency score, for ligating the set of polynucleotide fragments containing 4-base overhangs
- the ligase is a wild type T4 DNA ligase, or a variant thereof selected from a thermostable T4 DNA ligase and a salt tolerant T4 DNA ligase wherein, the ligase is selected from the group consisting of: T4 DNA ligase, T7 DNA ligase, hLig3 DNA ligase, T3 DNA ligase, PBCV-1 DNA ligase, a temperature stable variant of any of T4 DNA ligase, T7 DNA ligase, hlig3 DNA ligase, T3 DNA ligase, or PBCV-1 DNA ligase and a high salt stable variant of any of T4 DNA ligase, T7 DNA ligase, hLig3 DNA
- the selected experimental conditions for ordered assembly of a target polynucleotide from the set of polynucleotide fragments include ligation conditions comprising one or more of, a salt concentration, a DNA repair enzyme, a temperature range and/or thermocycling conditions for cleavage and ligation.
- the salt concentration may be in the range of 50 mM-150 mM salt
- the DNA repair enzyme is EndoMS or T7 Endo I
- the temperature range is 37°C-50°C
- the thermocycling conditions are selected from drop-down, touch-down and touch-up temperature cycling.
- additional features may include: the nucleotide sequence of an assembly selected from a virus genome, a prokaryotic genome, an operon and a metabolic pathway; and wherein the number of polynucleotide fragments to produce an assembly is in the-range of 2-100 fragments.
- a computer-readable medium is provided for performing the methods described by suitable software.
- a method for synthesizing a target polynucleotide that includes: (a) obtaining a set of overhangs that have a suitable overall fidelity score under a set of experimental conditions including selection of a ligase using the computer implemented method described above; wherein the computer instructs an automated instrument or a user to assemble, under the set of selected experimental conditions, determined at least in part by the user, a set of polynucleotide fragments having sequences optionally determined by the computer or by the user, that have been enzymatically obtained or chemically synthesized; (b) permitting the optionally automated ordered assembly of a target polynucleotide by combining a ligase, restriction endonuclease and the polynucleotide fragments under the selected experimental conditions within the instrument or in a reaction tube; and (c) optionally introducing the target polynucleotide into: (i) a bacterial cell; or (ii) into an in vitro system, for expression of the gene or
- This method enables assembly of the target polynucleotide by repeating steps (a) and (b) such that in the first round, the polynucleotide fragments are less than 1000 bases in length so that the assembled fragments form an interim target polynucleotide and the interim target polynucleotides form the polynucleotide fragments for the next round of ordered assembly to form the final target polynucleotide.
- the set of polynucleotide fragments in (a) is 2-100 fragments more specifically 20-100 fragments or at least 20 fragments.
- the method may include performing multiplex amplification of the set of polynucleotide fragments prior to (b).
- the target polynucleotide may be a DNA that may be transcribed to form a target RNA.
- the target polynucleotide may be a DNA and wherein the DNA is expressed in cells to produce one or more proteins.
- the target proteins may be part or all of a metabolic pathway, a viral genome or an immune cell gene.
- a method of performing an ordered DNA assembly from 20-100 DNA fragments to create a large DNA has the following steps that can be performed in any order: (a) obtaining instructions from a computer design tool for an optimized set of 4-base overhang sequences for joining 20-100 fragments in an ordered assembly reaction, wherein the computer design tool computes the optimal set of overhangs from one or more sets of data, wherein each set of data results from frequency and fidelity analysis of individual ligase preferences for all combinations of four base overhangs; and (b) obtaining 20-100 fragments having the optimized set of 4-base overhangs for ligation with a selected ligase in an ordered assembly reaction to create a large DNA.
- the method may include; adding a Type IIS restriction endonuclease recognition sequence to the 20-100 fragments using a polymerase chain reaction (PCR) or inserting the 20-100 fragments into 20-100 plasmids having a Type IIS restriction endonuclease recognition sequence at the insertion site at each end of the fragment or synthesizing the 20-100 fragments with the optimized 4-base overhangs.
- PCR polymerase chain reaction
- a Type IIS restriction endonuclease may be selected that has a recognition sequence of 5'CACCTGC3' and the cleavage site to create the optimized set of 4-base overhangs is 5'CACCTGC (N4)3' (SEQ ID NO:2) and 3'GTGGACG(N8)5' (SEQ ID NO:3).
- the restriction endonuclease is PaqCI.
- the ligase may be selected from one or more of T4 DNA ligase, T7 DNA ligase, hLig3 DNA ligase, T3 DNA ligase, or PBCV-1 DNA ligase.
- a method for high-through put assembly of customized T-cells includes the steps of: (a) identifying a surface antigen on a tumor cell from a patient, wherein the protein is specific for the tumor cell; (b) collecting T-cells from the patient; (c) causing an ordered assembly of DNA fragments with 4-base overhangs to form a large DNA encoding a chimeric antigen receptor that is tumor antigen specific; and (d) introducing the large DNA into the genome of the T-cell that has been cleaved by site direct CrispR.
- the large DNA in (d) may be the product of ordered assembly of a plurality of DNA fragments that are conserved and a plurality of variable DNA fragment sequences such that at least the conserved DNA fragments are individually stored in plasmids in bacterial cells for high throughput assembly of the customized T-cells.
- a method for creating viral genomes with mutations that include: (a) generating a plurality of fragments for ordered assembly into a viral genome; (b) selecting four base overhangs that permit ligation of multiple mismatches by a ligase; and (c) testing the product viral genome for antibiotic activity or as a substrate for vaccine production.
- the ligase is a relatively low fidelity ligase for example, hLig3.
- the above methods may be accomplished in high throughput workflows using microfluidic devices or robotic devices to handle multiple samples in repetitive cycles of joining fragments to create any size of DNA from small fragments of DNA.
- FIG. 1A-1B shows the PaqCI performance comparison for 24 fragment assembly (efficiency) as determined by the number of colonies with blue phenotype indicating correct LacL/ILacZ assembly per 50 ul outgrowth (1/20 total outgrowth).
- PaqCI from Paucibacter aquatile together with a synthetic activator oligonucleotide (also referred to as “activator” or “oligonucleotide”) and T4 DNA ligase provided greatly enhanced efficiency of colony formation and fidelity of sequences in the assembled large DNA from 24 fragments compared with Aarl.
- FIG. 1A PaqCI provided more than 10 fold greater number of colonies having correct assembly than observed for Aarl over 30 cycles.
- FIG. IB PaqCI provided at least 15% greater fidelity than Aarl in 30 cycles (5 minutes 37°C to 5 minutes 16°C) for a 24 fragment assembly reaction of a Lacl/LacZ cassette as determined by blue colonies.
- FIG. 1C shows the recognition sequence and asymmetric cut site for PaqCI to produce a 4-base overhang.
- FIG. ID shows that unlike Aarl, PaqCI cuts to completion and does not exhibit star activity.
- FIG. 2A-2C provides a schematic of an assay to determine how fragments with different sequence overhangs are affected by ligation bias and the fidelity of the ligation event.
- FIG. 2A Libraries containing randomized four base overhangs were synthesized. Sample randomized overhang pairs are schematically represented.
- FIG. 2B Ligation substrates are ligated with a specified DNA ligase and correct (same overhang shading) and mismatch containing (different overhang shading) products are formed.
- the correct or mismatch was analyzed using SMRT® sequencing ( Pacific Biosciences, Menlo Park, CA).
- FIG. 2C Ligation fidelity is defined as the fraction of correct ligations. Ligation bias is detected by differences in total numbers of ligation products formed for each overhang.
- FIGs. 3Ai/3Aii-3Hi/3Hii) shows significant variation between different ligases with respect to sequence preferences with observed variation between correct ligations and mismatch ligations. The number and type of 4-base sequences that are underrepresented also varies between ligases. This reveals bias of at least 2 types- bias for or against a ligation event for certain 4-base overhangs and bias for ligation of mismatches and against perfect matches or vice versa.
- 3Ai-3Hi shows a ligation frequency heat map matrix of all ligation events (log-scaled). Overhangs are listed alphabetically left to right (AAAA, AAAC, AAAG ... I I I G, I I I I ) and bottom to top such that the Watson- Crick pairings are shown on the diagonal.
- the matrix shows ligation frequency for each of 256 X 4-base overhangs on the X axis against 256 x 4-base overhangs on the Y axis. Each base in the 4-base overhang is color coded where T is red, C is blue, G is yellow and A is green (colors represented by different shades of grey).
- 3Ai-3Hii shows a stacked bar plot of frequency of ligation products containing each overhang, corresponding to each column in the heat map in (A).
- Fully Watson-Crick paired ligation results are indicated in blue, and ligation products containing one or more mismatches are in orange (represented by two shades of grey) A.
- Certain overhangs are under-represented as indicated by arrows
- FIG. 3Ai and 3 Aii is T4 DNA ligase. TAAA, TCAA, TGAA and TTAA are underrepresented.
- FIG. 3Bi and 3Bii is T7 ligase. Many 4-base overhangs are underrepresented.
- FIG. 3Ci and 3Cii is Human ligase 3 (hLig3).
- CAAG, CCAG, CGAG, CTAG, TAAA, TCAA, TGAA and TTAA are underrepresented.
- FIG. 3Di and 3Dii is T3 ligase. TAAA, TCAA, TGAA and TTAA are underrepresented.
- FIG. 3Ei and 3Fii is PBVC-l ligase. TAAA, TCAA, TGAA and TTAA are underrepresented.
- FIG. 3Fi and 3Fii is T4 ligase + PEG. TAAA, TCAA, TGAA and TTAA are underrepresented
- FIG. 3Gi and 3Gii is T7 ligase + PEG showing the beneficial effect on ligation using T7 DNA ligase.
- FIG. 3Hi and 3Hii is hlig3 +PEG.
- CAAG, CCAG, CGAG, CTAG, TAAA, TCAA, TGAA and TTAA are underrepresented.
- FIG. 4A shows how median ligation and the spread of bias according to base content of overhangs varies for different ligases as determined by frequency of ligation of every combination of 256 different overhangs from a sequenced library for each ligase.
- T4 DNA ligase, T7 DNA ligase, T3 DNA ligase, and PBCV-1 DNA ligase have a similar median bias (shown by the black horizontal line) with a similar distribution of positive bias for GC rich overhangs but some variation in amounts and extents of negative bias for AT rich overhangs.
- T7 ligase exhibits greater median ligation bias than the other ligases with few overhangs ligated very efficiently, and the majority of overhangs ligated with much less efficiency where frequency of ligation (y-axis) is a measure of efficiency of ligation.
- each dot was colored according to its %GC content with different colors for 0%, 25%, 50% and 75% and 100%.
- the distribution of dots show that GC-rich overhangs tend to ligate more efficiently compared to AT-rich overhangs.
- the results shown were generated by SMRT sequencing of ligation reactions with 100 nM of the multiplexed four-base overhang substrate with 1.75 pM T4 DNA ligase, T7 DNA ligase, human DNA ligase 3, T3 DNA ligase, or PBCV-l DNA ligase incubated 1 hour at 25°C in standard ligation buffer.
- FIG. 4B shows that the median ligation fidelity (see black line) for T4 DNA ligase, T7 DNA ligase, human DNA ligase 3, T3 DNA ligase, and PBCV-l vary and the spread of GC rich and AT rich overhangs across the ligation fidelity profile also varies between enzymes.
- T7 DNA ligase shows the highest ligation fidelity.
- hLig3 shows the lowest ligation fidelity and also the widest spread of data points below the median line.
- Ligation fidelity was calculated and plotted for all ligases studied. Ligation fidelity is defined as the percentage of correct (Watson- Crick) versus incorrect (mismatch) ligation events.
- FIG. 5A-5F shows that polyethylene glycol (PEG) has a significant positive effect on the ligation frequency of overhangs with relatively low GC (a group of overhangs that are generally show less ligation frequency in the absence of PEG) but a slightly negative effect on ligation fidelity.
- the ligation frequency and ligation fidelity of overhangs is grouped by GC content. The median value is indicated by a horizontal line (dotted line for ligation reactions completed in buffer that did not contain PEG; black for ligation reactions completed in PEG-containing buffer).
- FIG. 5A is T4 DNA ligase (frequency) +/- PEG.
- FIG. 5B is T7 DNA ligase (frequency) +/- PEG.
- FIG. 5C is hLig3 DNA ligase (frequency) +/- PEG.
- FIG. 5D is T4 DNA ligase (fidelity) +/- PEG.
- FIG. 5E is T7 DNA ligase (fidelity) +/- PEG.
- FIG. 5F is hLig3 DNA ligase (fidelity) +/- PEG.
- FIG. 6 shows, by means of a high-level block diagram, a system for generating an estimated overall ligation fidelity for a user-specified overhang sequence set; and also experimental conditions to achieve a desired result.
- the system utilizes client 802 having bidirectional data communication 803 with a server 804 that in turn has access to storage 806 via 808 where 806 includes a database of 4-base 5'-3' ligation fidelity. This can also be a 2-base, 3-base or 5-base database.
- Bidirectional data communication 803 may be implemented using a local connector such as a local area network (LAN) or a wide area network.
- Server 804 may be a dedicated resident server or may be implemented in the cloud.
- Data storage 806 may be co-loaded with server 804.
- client 802 may include a browser interface.
- client 802 may host a graphical user interface for use to enter sets of 5'-3' 4-base canonical form of AGCT or other overhang sets or for selecting experimental conditions for ligation such as a selected restriction endonuclease, a selected ligase, a buffer contain PEG, temperature and time of reactions, other experimental details.
- FIG. 7 shows input and output steps in a high-level flow diagram for execution of an assembly reaction using the system outlined in FIG. 8.
- user enters a set of overhang sequences of any desired length, for example, the set of overhang sequences will be a set of 5'-3' 4-base overhang sequences 902.
- the set will contain more than one 4-base overhang sequence such that each member of the set differs from all the other members of the set.
- Each overhang sequence represents a member of a single overhang pair that the user preferably wishes to use in an experiment to join in order a plurality of double stranded nucleic acid fragments.
- the 4-base overhang may be represented as a Watson-Crick pair of overhangs.
- a single overhang pair in a set may vary with respect to ligation fidelity depending on whether a particular sequence is a 5' sequence or its complement. Each member pair is considered separately from the other member pairs in the set.
- the user may select experimental conditions for ligation of fragments having overhangs corresponding to the entered sequences in 902. These experimental conditions include, for example, time of incubation with ligase, temperature of incubation, and ligation frequency and fidelity for selected ligases 904.
- the system By accessing a database of ligation fidelity for individual overhangs or overhang pairs, the system generates an output describing the ligation fidelity for the entered overall overhang sequence set and/or for individual overhang pairs in the set 906.
- the system may additionally output a graphical matrix representation of ligation fidelity for the selected overhang sequence pairs. If the identified fidelity efficiency of the set of 4-base overhang sequences input by the user is rejected by the user, the user is enabled to assess the ligation of the identical set of 4-base overhangs under different selected experimental conditions or to enter a modified set of 4-base overhang sequences under the same or different experimental conditions to determine how to join the set of double stranded nucleic acid fragments in an ordered assembly.
- FIG. 8 is a high-level flow diagram showing inputs in addition to system output steps.
- input 1302 -1306
- outputs 1308-1312 Individual examples are provided for user entry of input (1302 -1306) generating outputs 1308-1312.
- the input parameters in 1302-1306 may be substituted or added to by any one or more or two or more of the following:
- ligase e.g., T4 DNA ligase, T7 DNA ligase, PBCV-l, T3 ligase, h Lig3 or any other ATP dependent DNA ligase or NAD+ dependent ligase such as Taq DNA ligase;
- a choice of restriction endonuclease e.g. one or more of Esp3l, Sapl, Bbsl-HF; BspQI, Hgal, BsaBI, BsaJI, Bsal, Bsal-HFv2, Bsil, BsmAI, BsmBI, BsmFI, Bsml, Bsr DI
- ' Output may include one or more of the following:
- FIG. 9A-9D shows that how the data presented in FIG. 3 Ai/3 Aii-3Hi/3Hii) to FIG. 5A-5F can be integrated into a computer.
- additional parameters include use of PEG and/or aprataxin in the buffers selection.
- a further drop down menu for adding to the user interface page in FIG. 9B is a drop down menu that permits choice of a ligase and this will influence the selection of overhangs based on frequency, bias and fidelity data described herein.
- FIG. 9B mitigatese Fidelity Viewer
- FIG. 9C shows the drop down menus for GetSet, the interface that will inform the user as to how good their chosen set of overhangs will perform in a specified ligation assembly reaction and whether certain overhangs should be included or excluded from the set.
- FIG. 9D shows the drop down menu for SPLITSET which informs the viewer what sites should be included and which should be excluded in an in silico sequence for the production of fragments from the corresponding DNA by targeted cleavage or by DNA synthesis.
- FIG. 10 shows that PEG increases the frequency of colonies obtained from multi-fragment assembly with T4 DNA ligase and Bbsl-HF restriction endonuclease obtained for a particular concentration of DNA compared to the same DNA in the absence of PEG. All PEG sizes showed some improvement.
- Preferred embodiments included PEG 3350 and PEG 6000.
- FIG. 11 shows that PEG 6000 enables the use of 10 fold less DNA to achieve substantial colony representation following assembly of 24 fragments of DNA using T4 DNA ligase and Bbsl-HF.
- FIG. 12A and 12B shows that 50 DNA fragments having overhangs determined by the computer tool described in FIG. 9A-9D that included adjustments for the ligation preferences for T4 DNA ligase enabled improved efficiency of assembly of the T7 viral genome from the 50 fragments as determined by plaques on a lawn of bacteria.
- FIG. 13A and 13B show that plaques obtained on a lawn of bacteria do indeed contain intact phage T7 DNA.
- FIG. 14 shows that the percentage of colonies that contain correctly assembled constructs is maintained at least 50% for 52 fragments using the tools described herein to design overhangs for correct end joining. These results are obtained from one pot fragment assembly reactions.
- FIG. 15 shows a cartoon of how improved multi-fragment assembly methods can be used for scale-up for Car-T cell therapy for thousands of individual patients. Tumor cells from individual patients are analyzed to discover their unique tumor specific antigens and the DNA sequences for genes that encode these neoantigens containing mutations. The patient's own T-cells are removed and engineered to insert an assembled gene at a target site in the genome that has been recognized and cleaved by CRispR.
- the T-cells can then be reintroduced into the patient to destroy the tumor cells.
- a subset of multiple components required to synthesize a tumor antigen will be conserved and a subset of components will not be conserved.
- the entire region of interest may be maintained in plasmid libraries ready for use and individual non conserved fragments where the mutation is identified can be used in the assembly reaction. In this way, the entire gene need not be made de novo for each patient allowing for higher throughput of samples in the workflow.
- FIG. 16 shows a cartoon of phage engineering to treat drug resistant bacterial infections for potential antibiotic solutions.
- a phage genome is divided into small pieces and various mutations introduced into any one or more fragment.
- the engineered phage can be assayed for their ability to invade and destroy the target bacteria.
- target polynucleotide refers to the end product of a ligation based ordered assembly of fragments that may be DNA, RNA or a mixture thereof.
- polynucleotide fragments refer to the building blocks that when assembled, create the target polynucleotide. These building blocks may be derived from sequence databases and may contain promoter sequences, enhancer sequences, coding sequences etc. Polynucleotide fragments may be made by chemical synthesis (IDT, Coralville, IA) or by enzymatic synthesis using for example, a terminal transferase-based synthesis. The fragments made in this way may be assembled in a preliminary step from the products of chemical and/or enzymatic synthesis to form larger polynucleotide fragments suitable for assembly into a gene.
- polynucleotide fragments are amplified from a template, e.g., by PCR their length does not exceed the processive capability of the polymerase used in amplification.
- amplicons rarely exceed 5 kb-10 kb and may have a minimum length of 15 nucleotides in length.
- oligonucleotide in the intended context refers to a multimer of at least 10, e.g., at least 15 or at least 30 nucleotides. In some embodiments, an oligonucleotide may be in the range of 15- 500 nucleotides in length, or more. Any oligonucleotide used herein may be composed of G, A, T and C, or bases that are capable of base pairing reliably with a complementary nucleotide.
- Modified nucleotides may optionally be included in an overhang sequence and hence in a ligation efficiency database.
- a plurality of letters in a specific order describes figuratively the base (also referred to as nucleotide) composition of a molecule.
- the terms "perfect match”, “complementary” and Watson and Crick pairs each refer to the pairing by hydrogen bonds of bases on separate strands of a duplex DNA where A is matched to a T or U and G is matched to a C.
- junction refers to a position in a target polynucleotide where component polynucleotide fragments have been joined by a ligase.
- junction also refers to a position in a sequence of a target polynucleotide in a database where fragmentation is recommended for assembly of a target polynucleotide from an optimized set of fragments.
- the context of the word “junction” will make clear which of the two meanings is intended.
- the assembly methods described herein may be used to create scarless junctions in the target polynucleotide meaning that the junction in the target polynucleotide would be indistinguishable from the corresponding position in the original polynucleotide sequence.
- the term "overhang” refers to a single stranded region at the end of a double stranded fragment polynucleotide for example DNA.
- the overhang is preferably formed by an enzyme that creates a staggered cleavage of the nucleic acid on both strands of the duplex outside the recognition region.
- the overhangs are generally 5' overhangs.
- the overhang can be defined by its length and its sequence. For example, there are 256 different possible 4-base overhangs (4 4 ). Overhangs of 2-bases, 3-bases, 4-bases and 5-bases are exemplified here, generated by restriction endonuclease cleavage.
- the overhang can contain 2-8 bases although 3 or 4-base overhangs are generally preferable.
- the preference derives from the availability of restriction endonucleases that cleave double stranded DNA outside the recognition site to produce 3 or 4-base overhangs and from the number of possible overhang pairs in a set which is sufficient to optimize ligation of a plurality of polynucleotide fragments to form a target polynucleotide.
- Matching the overhang from one polynucleotide fragment with a second complementary overhang on a second polynucleotide fragment results in a junction if a ligase is added to the mixture and ligation occurs depending on the ligase preferences for the overhang sequence and its complement.
- the first overhang and the second complementary overhang are referred to as overhang pairs or complementary overhangs. While not wishing to be limited by theory, it is proposed here that combining the ligase with the restriction endonuclease in a single assembly reaction mixture results in a significant reduction in inappropriate hybridization and ligation events. These inappropriate events occur when a cleavage product that consists of an overhang and the restriction endonuclease recognition sequence reconnects with the assembly fragment from which it has been cleaved or reconnects with another cleavage product.
- sequences are cleaved again by restriction endonucleases in the reaction mix to liberate the polynucleotide overhangs for proper ligation to the compatible polynucleotide fragment partner.
- Other inappropriate events may occur when non complementary overhangs anneal, resulting in mismatches. This generally occurs only with one or two mismatches and can affect the order of assembly unless the occurrence of mismatches of annealed overhangs in factored into the assembly strategy.
- the term "inputs" refers to the information the user enters into the computer. These may include: specified reaction conditions, a target polynucleotide sequence that can be divided into polynucleotide fragments, excluded overhangs, included overhangs, and the number of desired fragments or overhangs. Input parameters are received by the computer.
- outputs refer to instructions from the computer that enable the user to make the desired target polynucleotide. These may include: overhang sets with preferred ligation fidelity scores for a specified number of junctions, and/or full polynucleotide fragment sequences based on input of the target polynucleotide. Where polynucleotide fragment sequences are entered by user, then the computer output may include pairs of overhangs that avoid internal sites, palindromes and repeat overhangs and provide a high overall fidelity score for the specified reaction conditions including cycling conditions, incubation time and temperature and recommended enzymes for optimizing ligation fidelity. Computer outputs may further provide graphical display of fragment assembly design and fragment sequences or link to the same. A computer output may also provide a matrix of ligation frequencies for all combinations of the selected overhangs in order to graphically illustrate the predicted fidelity for a chosen set of overhangs and their complements or link to the same.
- the tool can provide ligation data in a graphical output, indicating the general efficiency of each connection.
- the checkbox can be toggled to display normalized ligation counts.
- the relative ligation frequency was experimentally determined for all 2564- base overhangs in a single experiment. Total ligation events for each experiment were normalized to 100,000; in this case, a typical frequency for any single Watson-Crick pair was 300-400 observations per 100,000 ligation events. Further details are provided in: Potapov, et al. Nucleic Acid Research, 46, e79 (2016); Potapov, et al. Cold Spring Harbor Laboratory, bioRxiv, doi: https://doi.org/10.1101/322297 (2016); and Potapov et al. ACS Synthetic Biology 711, 2665-2675 (2016).
- the term "experimental conditions" refer to choices of a ligase, endonuclease and/or other enzymes as desired for the workflow and their unit ratio.
- the conditions also refer to buffers and cofactors in the buffers.
- the ligase to restriction endonuclease unit ratio may be within the range of 1:10- 1:1000 regardless of the type of DNA ligase or Type IIS restriction endonuclease selected.
- Experimental conditions may include salt concentrations, temperature and time used to complete ligation of overhangs and may further include cycling conditions for ligation reactions. Experimental conditions may be selected to reduce the assembly time for large numbers of fragments, improve the fidelity score of the selected set of overhangs, .
- Experimental conditions may also affect removal of mismatches in the target polynucleotide.
- Watson/Crick perfect matches may be preferred although in some cases a single base mismatch in the overhang may provide a higher fidelity score for ordered assembly than a perfect match of bases that do not readily hybridize as deduced from the ligation frequency tables.
- Alternative splicing may also occur during assembly resulting in a mismatch at a junction.
- Mismatches can be removed using EndoMS or T7 Endo I, or other repair enzyme that identifies mismatches, to cleave the DNA at the mismatch.
- the term "experimental conditions" includes ligation conditions and the context will determine if these terms are interchangeable.
- ligation frequency refers to the number of times an overhang will ligate to another overhang out of a total number of ligations (e.g. 100,000 ligations).
- ligation fidelity refers to a numerical assessment of discrimination against the ligation of substrates containing mismatched base pairs bias (preferential ligation of particular sequences over others). Ligation fidelity also refers to the fraction of ligation events that are correct (Watson-Crick ligation products) versus incorrect (mismatch products). In a 4-base overhang, the possibilities are that no base is mismatched (Watson-Crick ligation product) , there is a 1-base mismatch, 2-base mismatch, 3-base mismatch or all 4-bases are mismatched.
- ligation fidelity by overhang or "ligation fidelity score for an individual overhang” refers to the frequency at which an individual overhang and its complement independently ligate to a perfectly complementary overhang relative to all overhangs in a set and their complements.
- a fidelity score can be calculated by consulting a ligation frequency table, which comprising individual experimentally defined measurements of the number of ligation events for each overhang to all overhangs of the same length (including itself).
- a ligation fidelity score for an individual overhang is calculated as the number of ligation events that occur between the individual overhang and its complement relative to the total number of ligation events that occur between (i) the individual overhang and all of the overhangs in the set and their complements; and (ii) the complement of the individual overhang and all of the overhangs in the set and their complements.
- ligation fidelity of an entire set and “overall fidelity score” refer to the expected ratio of correctly ligated assemblies to incorrectly ligated assemblies based on the individual ligation fidelity scores for each member of a given set of overhangs.
- An overall fidelity score for a set of overhangs can be calculated by multiplying the individual ligation fidelity scores for the overhangs in the set together.
- all assembly fidelity refers to the actual number of correctly assembled target nucleic acids compared to the predicted number of correctly assembled target nucleic acids.
- assembly efficiency of 10 polynucleotides with overhangs can be determined by the number of times all 10 junctions are ordered correctly in the population of target polynucleotides. Assembly fidelity may be greater than 20%, 30%, 40%, 50%, 60%, 60%, 70%, 80%, or 90%.
- ligation efficiency refers to the number of correct assemblies as a function of time.
- assembly efficiency refers to the rate at which full length ligation products (complete target nucleic acids as determined by size or colony formation or sequencing) accumulate in a particular assembly reaction after a particular time period.
- An arbitrary unit of time may be selected which will provide an overall average/unit time for ordered assembly of a target polynucleotide.
- the ligation efficiency may not be linear over a selected incubation period.
- ligation yield refers to the number of correct assemblies.
- ligation accuracy refers to the number of correct end joining of fragments over number of total assemblies. This may be determined by sequencing.
- ligation refers to the product of assembly which requires a DNA ligase to join fragments.
- ligation refers to the product of assembly which requires a DNA ligase to join fragments.
- ligation is attributable to specific features of bias and/or fidelity of the ligation event for different ligases where it was found that variability existed in a manner that could be useful or detrimental to a planned assembly.
- Ligase refers to an enzyme that is capable of joining two polynucleotides covalently. Many different ligases have been described in the art and are widely known (see Ellenberger et al. Annual Review in Biochemistry, 77, 313-338 (2008); Bauer et al. PLOS ONE, 10, 12:e0145046 (2017)).
- Ligases for use in assembly reactions may include ATP ligases and NAD+ ligases such as T4 DNA ligase, T7 DNA ligase, Taq DNA ligase, viral ligases such as chlorella virus DNA ligase (e.g., PBVC-l ligase), bacterial ligases such as bacterial LigA (e.g., E. coli DNA ligase) and LigD; archeal ligases such as Thermus thermophilus (Tth) Ligase and eukaryotic ligases such as Mammalian Ligl and hLig3.
- multi-fragment assembly refers to multiple DNA fragments or a set of DNA fragments of any size greater than about 15 nucleotides that have been synthesized chemically or within plasmids in a library of bacteria containing plasmids with different inserts.
- the fragments may be all a similar or the same size or may have various sizes.
- PaqCI refers to a 7-base cutter restriction endonuclease derived from Pauciibacter aquatica.
- the endonuclease identified here as PaqCI includes any variant having at least 80%, 85%, 90%, or 95% sequence identity to SEQ ID NO: 1.
- the ordered assembly of multiple polynucleotide fragments into a single DNA relies on the use of two different enzymes, namely a Type IIS restriction endonuclease and a ligase.
- Type IIS restriction endonucleases recognize 4, 5, 6 or 7-bases in a DNA and cleave outside the recognition sequence to provide polynucleotide fragments with overhangs that may be 2-bases, 3-bases, 4-bases or 5-bases in length. These fragments become joined when complementary overhangs anneal and a ligase seals the join.
- Type IIS restriction endonucleases recognize up to 6-bases and cleave the DNA outside the recognition sequence to create 2-4 base overhangs.
- Many of the current endonucleases may be found in the commercial literature (including www.neb.com) provided by New England Biolabs (NEB), Ipswich, MA, including recognition sequences and length of overhang generated by cleavage.
- New endonucleases are listed in a regularly updated database (see REBASE® on www.neb.com from New England Biolabs).
- this includes: Acul, Alwl, Bael, Bbsl, Bbnl, Bccl, Bcgl, BciVI, BcoDI, BspMI, BfuAI, Bmrl, Bpml, BpuEl, Bsal, BsaXI, BseRI, Bsgl, BsmAI, BsmBI, BsmFI, BspCNI, BspMI, BspQI, BsrDI, Bsrl, BtgZI, BtsCI, BtsIMutl, CspCI, Earl, Ecil, Esp3l, Faul, Fokl, Hgal, HpHI, HpyAv, Mboll, Mlyl, Mmel, Mnll, NmeAIII, Piel, Sapl, and SfaNI.
- Sapl has a 7-base recognition sequence and cleaves DNA to produce a 3-base overhang.
- These endonucleases are all available from New England Biolabs, Ipswich, MA.
- the recognition sequences and overhangs are described by NEB along with reaction buffers, reaction temperatures and storage conditions. Isoschizomer information is also provided.
- PaqCI Type IIS restriction endonuclease
- Recognition sequences for PaqCI are added to the termini of fragments by primer dependent amplification or by chemical nucleic acid synthesis.
- the sequences adjacent to the restriction endonuclease recognition sequence create the overhangs. Preferably, these are selected so as to optimize joining of an ordered set of fragments that comprise a target polynucleotide or large DNA.
- PaqCI is described here in the context of ordered polynucleotide fragment assembly, this enzyme may also be used in a variety of other bioengineering methods and analysis of genomic DNA including chromatin where endonucleases with recognition sequences of six or more bases are preferred.
- the ordered assembly of multiple polynucleotide fragments into a single DNA may rely on the use of a ligase but not require a restriction endonuclease. Restriction endonucleases are not required when the polynucleotide fragments with designed overhangs are generated by chemical synthesis instead of endonuclease cleavage.
- an advantage of the two enzyme system is that a polynucleotide fragment can be inserted into plasmids that are retained in bacteria and stored indefinitely for future use. When needed, the bacteria can be readily grown to produce the desired quantity of substrate for endonuclease cleavage and ligation.
- Ordered assembly of multiple polynucleotide fragments that relies on ligation of annealed overhangs has been greatly improved by the systematic analysis of frequency of overhang ligation, bias and fidelity.
- Factors that have been identified include the length of the overhang, the number of different overhangs in a set of overhangs, the GC content of the overhangs , the bases that arise at the edges of the overhang sequence, the ligation reaction conditions and the type of restriction endonuclease that generates the overhang (see W02020/081768).
- sequence preferences of various ligases with robust end joining activity have been identified and found to contribute in significant ways to the frequency and fidelity of the ligation product. Differences and similarities of various ligases have been identified including the extent of mismatches tolerated between annealed overhangs, and the preference for certain patterns of A,T, G and C bases in an overhang.
- the ligases described in the examples are all end joining ligases catalyzing the formation of a phosphodiester bond between the 3'-hydroxyl of one DNA strand and the 5'-phosphorylated termini of another DNA strand. They all contain at least two domains corresponding to: a nucleotidyl transferase domain (NTase) with a catalytic lysine residue; and an oligonucleotide binding domain (OBD) having a DNA binding surface.
- the ligases also optionally contain a third domain.
- T4 DNA ligase and T3 DNA ligase both contain an N- terminal DNA binding domain while Human ligase 3 (hLig3) contains an N-terminal poly ADP-ribose polymerase- like zinc finger domain and the chlorella virus DNA ligase (PBCV-1) contains a latch domain.
- PBCV-1 chlorella virus DNA ligase
- T7 DNA ligase does not contain a third domain. While not wishing to be limited by theory, the presence or absence of a third domain may play a role in ligation bias, promiscuity and/or fidelity.
- End joining activity was analyzed for each of 256 combinations and permutations of the four base overhangs.
- matrices of 256 x 256 sequences were constructed from sequencing data obtained from ligated overhangs, (see FIG 3Ai/3Aii-3H i/3H ii). The data from these assays were added to a computer design tool described in FIG. 6, FIG. 7, FIG. 8 and FIGs. 9A and 9B that allows a user to select a set of optimized overhang sequences for ordered assembly of a set of polynucleotide fragments.
- the computer tool described in WO 2020/081768 provides access to optimized sets of overhangs based on their annealing patterns.
- a restriction endonuclease can be selected from a menu of options for cleavage to generate overhangs.
- Metrics of ligation frequency and fidelity are provided for different overhangs using a single ligase, namely T4 DNA ligase, under assembly conditions that can be also selected from drop down menus containing buffer options and temperature and incubation time options.
- the ligation data described herein and in WO 2020/081768 captures ligase-substrate preferences and further enhances the precision of the previously described assembly options. This is especially important when large number of polynucleotide fragments (greater than about 20 fragments) are used for ordered assembly of a large DNA.
- the computer tool provides a suitable user interface for informing the user about the predicted efficiency (frequency) and fidelity profile for any fragment overhang or set of fragment overhangs under various experimental conditions.
- the data obtained on the ligation preferences of different ligases extends the menu of experimental conditions.
- the interface for the Ligation fidelity Viewer, GetSet and SplitSet containing drop down menus allows the user to select a suitable ligase for design of fragments with overhangs from a large DNA sequence in silica. Alternatively, the user can select a suitable ligase for a fixed set of overhangs.
- the different sequence preferences for ligation that result in ligation frequency and mismatch frequencies, and different fidelity profiles adds a further layer of refinement and efficiency of multiple fragment assembly. Modifications to standard ligase buffers that affect ligase activity such as polyethylene glycol are also described herein.
- Tools and methods are provided to enable the assembly of larger number of fragments with greater fidelity in the assembled sequences and higher frequency of bacterial colonies transformed with destination vectors that include the assembled DNA or packaged viruses that infect lawns of bacteria.
- an intact T7 viral genome was assembled from 50 fragments.
- the newly synthesized virus was shown to produce viral plaques on a lawn of bacteria.
- the availability of the ligase data offers improvements in 24 fragment and 50 fragment assembly of at least 10%, 20%, 30%, 40% or 50% more colonies than would be possible otherwise.
- the ability to assemble small numbers of polynucleotide fragments into a larger DNA can be performed relatively efficiently without additional refinements.
- advantages associated with the assembly of larger numbers of smaller fragments such as greater than 10 fragments or as many as 20 fragments or as many as 50 fragments or greater numbers such as up to 100 fragment or more
- Such advantages include: less incidence of error occurring in small synthetic oligonucleotides than in large synthetic oligonucleotides, and the ease of stably maintaining bacterial clones that have plasmid inserts of a small size where these clones can be stored and used as needed for various assembly projects to make large DNA.
- PaqCI is characterized by a protein having at least 80% sequence identity to SEQ ID NO: 1.
- PaqCI as used herein is intended to encompass variants that have at least 90%, at least 92%, at least 95%, at least 99% sequence identity to SEQ ID NO: 1.
- PaqCI relies on multiple subunits to interact with two recognition sites in order to cleave a single target site on each strand of the DNA duplex.
- PaqCI sequence, 510 aa (SEQ ID NO:1): MPYDHNAEADFAASEVARMLVADPGLCYDAASLPASISASASYEPSAAGWPKADGLVSVLEGGTSTQRAIALEYKRPQEGIHGLLTAIGQAHG YLHKGYSGAAIVIPGRYSSHPTPAEYVRDVLNAISGSRAIAVFSYSPPDTTSPTPFAGRIQCVRPLVFDAGRVHLRPANQGPKTQWVHMREGST TRDAFFRFLQVAKRLSADPTAPRPTLRSELVAAIGRLAPGRDPIEYITNTADNKFLTKVWQFFWLEWLATPAVLTPWKLEAGVYSAPGARTRILR EDGTDFSQLWEGRVNSLKETIAGMLNRGEISEAQGWEAFVGGISATGGGQDKQGVRARAHSYREDIDSALAQ.LRWIEDDGLPTDQGYRFMT ICERYGGANSRAAIDYMGATLIQTGRYASFLHYINRLSERKFAENPLAYTK
- the recognition sequence of PaqCI is (5'-CACCTGC-3'/3'-GCAGGTG-5’) and it cuts asymmetrically 4-bases from the recognition sequence in the 3' direction and eight bases from the complement of the recognition sequence in the 5' direction resulting in a 4-base overhang (see FIG. 1C)
- the activator oligonucleotide is a synthetic self-complementary single strand oligonucleotide that is folded so as to comprises a double-stranded DNA region and a single stranded DNA loop, for example a hairpin structure.
- An advantage of a hairpin over two single strands includes more complete annealing since the two ends of the single synthesized DNA strand are at exactly the same concentration.
- the double-stranded region of the activator oligonucleotide contains a binding (recognition) sequence for PaqCI and the oligonucleotide comprises unligatable 3' and 5' ends, and cannot be cleaved by PaqCI, meaning that the double stranded part of the oligonucleotide does not extend far enough beyond the recognition site to provide a cleavage site for PaqCI.
- the self-complementary oligonucleotide that comprises a double-stranded region and a loop is preferably less than 100 nucleotides in length and contains the recognition sequence (5'CACCTGC/3'GTGGACCG) for PaqCI and extends no more than 0-4 bases or 1-4 bases downstream from the 5' recognition sequence.
- One unnatural extension of a blocking moiety on each strand may be added so that that there are no correctly positioned phosphodiester bonds in a double-stranded region for the enzyme to cleave.
- the activator oligonucleotide may contain an uncleavable linkage.
- the 5' and 3' ends of the oligonucleotide may be flush or recessed by 1, 2, 3, 4, 5, 6, or more nucleotides, where either the 3' end or the 5' end can be recessed.
- the loop of the oligonucleotide is not critical and may be 4-20 nucleotides in some cases.
- the double-stranded region may be 10-50 base pairs in length, e.g., 10-30 base pairs in length e.g., 15-30 bases.
- the activator oligonucleotide has unligatable 3' and 5' ends that cannot be ligated to another substrate (polynucleotide fragment or activator oligonucleotide) by T4 DNA ligase or other ligase in a T4 DNA ligation buffer or other suitable ligase buffer.
- unligatable 3' and 5' ends are; a 3' end that does not contain a 3' hydroxyl and a 5' end that does not contain a 5' phosphate; a 3' end that contains a 3’ phosphate and a 5' end that contains a C3 spacer; or alternatively a ligation block at the 3' end such as a 3' dideoxy-C, 3' C3 Spacer (C3-OH), a C6 spacer or 3' Amino Linker (C6-NH2) and a ligation inhibiting modified base at the 5' end such as an inverted dideoxy thymine (invddT). Accordingly, ligation of the activators to each other or to the polynucleotide fragments is prevented.
- a ligation block at the 3' end such as a 3' dideoxy-C, 3' C3 Spacer (C3-OH), a C6 spacer or 3' Amino Linker (C6-NH2) and a
- a reaction mixture containing PaqCI also includes one or more activator oligonucleotides for adding to target double stranded DNA intended for cleavage.
- PaqCI may be used in a mixture with other restriction endonucleases having different or the same specificities.
- the amounts of PaqCI and activator have been optimized to fall within a range that produces substantially complete cleavage of DNA substrate by PaqCI but no star activity.
- the ratio of PaqCI to activator was found to be more significant for optimization of enzyme activity than the ratio of activator to the recognition site on the target oligonucleotides.
- Insufficient concentrations of activator relative to PaqCI resulted in incomplete cleavage of the target DNA and star activity. Too much activator resulted in incomplete cleavage.
- the optimal amount of the activator for a certain amount of PaqCI may vary according to its intended use. Standard restriction digest with PaqCI that does not involve complex assembly reactions in the same tube can be achieved using 1 pl of the enzyme (10 U) and 1 pl of the activator (20 pmoles). In these reactions, once the DNA substrate has been cleaved, it does not readily reassemble.
- the range may be selected from any of 0.75 pmole to 9 pmole activator /Unit PaqCI, 1 pmole to 7.5 pmole activator /Unit PaqCI , 1 pmole to 5 pmole activator/Unit PaqCI, 1.5 pmole to 7.5 pmole activator /Unit PaqCI, 1.5 pmole to 5 pmole activator/Unit PaqCI, 1.5 pmole to 4 pmole activator /Unit PaqCI, 2 pmole to 5 pmole activator/Unit PaqCI or 2 pmole to 4 pmole activator /Unit PaqCI.
- One unit is defined for this ratio as the amount of enzyme required to digest 1 pg of X DNA in 1 hour at 37°C in a total reaction volume of 50 pl in IX rCutSmartTM Buffer (50 mM Potassium Acetate, 20 mM Tris-acetate, 10 mM Magnesium Acetate, 100 pg/ml Recombinant Albumin (pH 7.9 @ 25°C).
- the unit definition and description of buffer is not intended to be limiting but rather to serve as a guideline for developing the appropriate ratios of activator/PaqCI.
- Other buffers may be utilized depending on uses including selected ligases. For example, a commercial T4 DNA ligase buffer may be preferable for DNA fragment assembly methods.
- a standard reaction volume is 50 ul and contains 1 pmole- 8 pmole activator (20 nM to 160 nM) per unit of PaqCI endonuclease or 10 pmole-80 pmole (200 nM to 1600 nM) of activator for 10 units of enzyme.
- the DNA in the reaction mix was 1 ug of lambda DNA.
- the activator concentration is at 20 uM with an enzyme concentration at 10 units/ul such that the optimum enzyme and activator in a 50 ul reaction resulting in a 1:1 ratio using 1 ul of each. For typical PaqCI reaction conditions this results in complete cutting of the DNA substrate recognition sites, even though the concentration of the substrate sites will vary depending on the DNA being cut.
- a lower ratio of activator to enzyme could be used (for example, 1:2, 1:3, 1:4, or 1:5) where the concentration of substrate sites is higher compared to typical reaction conditions.
- PaqCI or a variant thereof may be combined with an activator, a ligase and a plurality of DNA substrates in a reaction mix.
- the DNA substrates are contained in plasmids that contain PaqCI recognition sequences at the insertion sites with adjacent plasmid sequences that have been designed for ligation assembly of the substrates.
- every insert and every destination plasmid has an assembly active DNA fragment flanked by two sites.
- the reaction mix may be incubated at a time and temperature suitable for endonuclease cleavage and ligation of fragments (for example at 37°C and 60°C for 30-60 ligation cycles where each cycle is 1-5 minutes depending on the number of fragments in the mix).
- the desired reaction product is a large DNA molecule formed from the plurality of DNA substrates.
- Different levels of complexity of fragment assembly calls for different levels of PaqCI and DNA ligase as described above. As assembly reactions increase in complexity, more units of enzyme are required for maximal performance; for example, using T4 DNA ligase, 2.5 to 20 U of PaqCI can be used with 200-800 U of the ligase with the upper range of 10-20 U of PaqCI and 400-800 of DNA ligase being preferred for assembly of 20 or more fragments.
- PaqCI cuts to completion and does not have star activator when combined with activator (see FIG. ID). It has greatly improved performance when compared with Aarl (see FIG. 1A and IB).
- kits are provided containing reagents in a mixture or in one or more containers, the reagents including PaqCI or variants thereof ("PaqCI”) and activator molecules.
- the kits may further include a ligase.
- the kits may include the reagents in a reaction buffer or one or more of the reagents may be lyophilized and/or immobilized on a suitable substrate such as beads or a polymer matrix together or separately.
- the kit may additionally contain reaction buffer in a separate container for adding to the reagents.
- Multi-fragment assembly can be achieved by combining PaqCI with a selected ligase to generate fragments with 4-base overhangs.
- the ordered assembly depends on the fidelity of annealing of the overhangs and the promiscuous nature of ligation by ligases of all annealed overhangs which in turn depends on the conditions of ligation including the number of fragments to form a scarless contiguous DNA.
- Embodiments of the invention establish the role of various ligases in intrinsic ligase preference versus ligation associated annealing.
- T4 DNA ligase is the standard ligase for end ligations and large DNA assembly. However, it was unknown whether this ligase had sequence preferences that contributed to variable ligation profiles observed for end joined fragments having certain 4-base overhangs. Moreover, it was unknown how T4 DNA ligase compared in this respect to other ligases.
- Example 1 the frequency of ligation, bias and fidelity profiles of DNA ligases T4 DNA Ligase, T3 DNA Ligase, T7 DNA Ligase, PBCV-l DNA Ligase, and hLig3 were determined using a library of end-joining hairpin DNA substrates containing degenerate 5'-four-base overhang ends. The ligation products of these libraries were analyzed by sequencing. The number of reads for each overhang provided a value for ligation efficiency; the sequence bias for each ligase was inferred from the relative frequency of each overhang appearing across all ligation products.
- the hairpin substrate in the assay presents a complex equilibrium system that mimics the actual assembly of multi-fragments wherein ligation requires a ligase finding complementary ends of fragments. A rapid conversion to ligated product would be predicted if there were only two Watson-Crick binding partners in the reaction.
- the assay method provided a depth of information not available by separately examining individual overhangs and permitted a more rapid appraisal of fidelity and bias than would have been possible through testing each pairing in parallel.
- the raw data for the frequency of each ligation of every complementary 4-base overhang is presented in a heat map (matrix) in FIG. 3Ai/3Aii-FIG 3H i/3 Hii for different ligases under the same assay conditions.
- T7 DNA ligase showed the highest degree of sequence bias preferring to ligate perfect matches of bases in the 4-base overhangs. All other ligases examined had a much tighter distribution of ligation frequencies, but with differences in how tightly the data points are clustered around the average (see FIG. 4A). Both T4 DNA ligase and hLig3 showed the least amount of bias with the range of values more than two-fold smaller compared to T7 DNA ligase. PBCV-l and T3 had a similar average ligation frequency but a slightly larger range of observed ligation frequencies.
- T4 DNA ligase displayed moderate fidelity (72% correct ligation products).
- T4 DNA ligase, T3 DNA ligase, PBCV-l ligase and hLig3 had a broad range of fidelity for individual overhang sequences, with some overhangs having very few mismatch ligation events and others with frequent mismatch ligations (FIG. 4B). For many overhangs, even when presented with all possible partners, ligation products were almost exclusively with the Watson Crick partner.
- hlig3 may be selected to increase the chance of a fragment in a set of fragments, ligating to another fragment in an incorrect order because of the promiscuity of the ligase.
- T7 DNA ligase may be the enzyme of choice.
- additives to the ligation buffer such as PEG may somewhat enhance frequency of ligation without significant loss of fidelity so that T7 DNA ligase might be the ligase of choice for a 20+ fragment assembly workflow where otherwise this ligase might be less desirable.
- PEG Polyethylene glycol
- Millipore Sigma, Burlington, MA Polyethylene glycol
- PEG MW may be selected from 500, 600, 800, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000 and 10,000.
- Example 4 shows results with PEG 600, PEG 3350 and PEG 6000 at 6.8% w/v.
- the addition of PEG increased the overall library yield for both T4 DNA ligase and T7 DNA ligase (from 61% to 73% and from 20% to 45%, respectively) and there was a slight decrease in the yield of hLig3 (from 77% to 72%).
- the addition of PEG moderately decreased the overall fidelity of the multiplex ligation reaction forT4 DNA ligase from 72% correct ligation events in the absence of PEG to 67% in the presence of PEG (see Table 2, FIG. 5A-5E).
- the addition of PEG decreased fidelity by the same amount regardless of GC content, except for overhangs with 100% GC content which did not see a change in average fidelity.
- Typical reaction conditions and additives may impact different DNA ligases and provide insight on modifications that might improve particular application outcomes. For example, for applications such as cloning or adaptor ligation, the boost in ligation product yield for from adding PEG will likely outweigh the moderate loss of fidelity for T4 DNA Ligase and T7 DNA Ligase. However, for applications involving highly complex multifragment assembly, the loss of fidelity observed when adding PEG may require more consideration of the particular overhangs used to limit potential mismatch ligation among a particular overhang set. The addition of PEG makes T7 DNA ligase a more attractive candidate for large multi-fragment assemblies. The observed gains in efficiency for additional overhangs expands the pool of efficient potential overhang sequences, while the small loss in fidelity is tolerable due to the high overall fidelity of this enzyme. Data Optimized Assembly Design.
- Ligase Fidelity Viewer, GetSet and SplitSet tools are described here and in WO 2020/0181768 that provide data optimized assemble design that greatly improve the success of ordered assemble of fragments. These tools rely on menu choices to assist the user.
- the experimental conditions described below each result from a 256x256 data matrix of ligation frequency and fidelity.
- the computer tool can utilize this data to generate optimized overhangs for the desired number of fragments and type of overhang.
- the identification of suitable overhangs include one or more steps. Certain rules have been applied that include no palindromic overhangs, no duplicate overhangs, no overhangs with 3-bases in a row e.g., ACCA and ACCG: no more than 2-bases in the same position e.g., (ACGC and ATGG) and avoidance of overhangs with 0% GC overhangs and 100% GC overhangs (Nilsson et al.
- Embodiments describe how to obtain an optimized data set by profiling end-joining hybridization and ligation fidelity and bias to predict highly accurate sets of connections for ligation-based DNA assembly methods. This presents a significant improvement over the prior art rules that restrict the user to a limited number of 4-base overhangs, which is particularly constraining when sequences may not be chosen arbitrarily (e.g., when assemblies must break within coding sequences).
- Application of the ligation fidelity profile permitted an informed choice of junctions and enabled a highly flexible assembly design using more than 20 fragments in a single reaction.
- the computer design tool provides a selection of overhangs after the user inputs various requests.
- the computer tool receives selection or input of experimental conditions under which the assembly reaction is expected to occur (see for example Table 1).
- the experimental conditions will change the choice of ligation frequency tables and individual overhang ligation fidelity determinations accessed by the computer.
- the computer tool will also receive numbers, for example: (i) a desired number of overhangs for an assembly reaction; (ii) a length of the overhangs; (iii) the nucleotide sequence of the assembly; (iv) the set of intervals in which the nucleotide sequence of (iii) can be cleaved, causing the method to be executed and then receiving as output, an identified set of overhangs and/or receiving a set of fragments for the assembly, where the ends of the fragments are defined by the identified set of overhangs, depending on which information is input into the interface.
- numbers for example: (i) a desired number of overhangs for an assembly reaction; (ii) a length of the overhangs; (iii) the nucleotide sequence of the assembly; (iv) the set of intervals in which the nucleotide sequence of (iii) can be cleaved, causing the method to be executed and then receiving as output, an identified set of overhangs and
- This method may further include receiving instructions regarding the ligation conditions for ligating the set of overhangs or fragments containing the same.
- Ligation conditions may include one or more of a desired ligase, buffer conditions such as salt concentration, a temperature, temperature range and/or thermocycling times and temperature (which may be used for cleavage and ligation) and restriction endonuclease if used for generating overhangs.
- buffer conditions such as salt concentration, a temperature, temperature range and/or thermocycling times and temperature (which may be used for cleavage and ligation) and restriction endonuclease if used for generating overhangs.
- a ligation frequency table that corresponds to the specified conditions may be selected from multiple ligation frequency tables, each for a different ligation condition. After the ligation frequency table is selected, then the ligation fidelity scores can be calculated using data in that table.
- the number of overhangs may be in the region of 5-200, e.g., 10-100, e.g. 10-50, although the number of overhangs can be outside of these ranges in some circumstances.
- the length of the overhangs may be 2, 3, 4 or 5 nucleotides, where the length of the overhangs is limited only by the choice of restriction endonuclease or other means for generating the overhang and the frequency and fidelity of possible ligation reactions.
- the choice regarding the preferred length of overhang may be subject to the number of possible overhangs for any combination of nucleotides in the overhang where this number should exceed the number of fragments to be joined.
- the computer After the desired number of overhangs for an assembly reaction and the length of the overhangs have been received, the computer provides a set of overhangs from an overhang table, of the selected length (e.g., 2, 3, 4 or 5-bases). For example, if a user were to input, into the computer, 20 overhangs each 4-bases long, then the computer would output a set of 20 unique overhangs that did not include duplicates, complements, palindromes (e.g. GATC) or excluded sequences.
- GATC is an example of a palindromic sequence since its reverse complement is GATC.
- Palindromes should be avoided because any one fragment with palindromic ends could anneal to another identical molecule resulting in the disruption of ordered assembly.
- the interface may permit receiving a list of one or more overhangs that should be excluded or included. Overhangs that are excluded may be selected because of poor fidelity or frequency of ligation profiles or because the overhangs have been used elsewhere in a reaction. Included overhangs may be selected on the basis of experimental findings of their high fidelity and/or frequency values.
- the overhang ligation frequency table may be stored in computer memory and can include all possible overhangs of the desired length or a subset of the same. For example, for a 4-base overhang the overhang table may contain 2564-base sequences for a 3-base overhang the overhang table may contain 643-base sequences.
- the overhangs may be selected in any particular order. For example, in some embodiments the overhangs may be selected randomly whereas in other embodiments the overhangs may be selected in a defined order.
- the computer calculates a ligation fidelity score for each individual overhang and its complement in the set. For example, if there are 20 overhangs in the set, then there should be 20 ligation fidelity scores, where the ligation fidelity score of each individual overhang represents the frequency at which the individual overhang and its complement independently ligate to a perfectly complementary overhang relative to all overhangs in the set and their complements. For example, if a particular overhang and its complement ligate together with perfect complementarity 90% of the time relative to all overhangs in the set and their complements, then that overhang may have a calculated ligation fidelity score of 0.9.
- Ligation conditions can be selected using a drop-down menu, where the ligase options laid out in the drop-down menu include different ligation frequency tables. Examples of experimental conditions that were found to affect ligation efficiency, fidelity and yield to which selection of ligase is added affect experimentally determined values for frequency and fidelity of overhang ligation for ordered assembly of fragments.
- Buffer types including salt concentrations
- Cofactors such as crowding agents, repair enzymes and/or deadenylases (also see Tables 4 and 5);
- a pull down menu for experimental conditions in the user interface for Ligation Fidelity Viewer, GetSet and SplitSet in FIGs. 9B-9D are shown in Table 1.
- Example 1 describes in detail how the data was collected for a comparative study of 5 ligases.
- a ligation frequency table for a 4-base overhang should have an experimental value for each of all possible combinations of overhangs, i.e., 256 x 256 /2 datapoints, each value indicating the frequency of ligation of two overhangs under the defined experimental conditions. Details for how this data can be obtained is described in Example 1 as well as in Potapov, V. et al. (2016), ACS Synth. Biol., vol 7, p2665-2674; Potapov et al. Nucleic Acid Res 2018, 46 e79; Potapov et al. (2016) BioRxiv; Pryor, J. M. et al. (2020) PLoS One, e8592; Pryor, J. M. et al.
- the ligation fidelity score for an individual overhang can be calculated as the number of ligation events that occur between the individual overhang and its complement relative to the total number of ligation events that occur between (i) the individual overhang and all of the overhangs in the set and their complements; and (ii) the complement of the individual overhang and all of the overhangs in the set and their complements.
- the overall fidelity score for the set of overhangs can then be generated based on the calculated ligation fidelity score for each of the individual overhangs, as output above.
- the individual ligation fidelity score may be multiplied together to obtain the overall fidelity score. For example, if there are 20 overhangs that each have a fidelity of 0.950, then the overall fidelity score for that set of overhangs may be 0.36 (i.e., O.95 20 ).
- this calculation may, in addition, weight overhangs by how efficient an overhang is at ligating to its complement. For example, in some cases, two overhangs may have equal fidelities, but one ligates to its complement more efficiently than the other under the conditions used.
- the overhang that ligates with a higher efficiency may have a higher weight than the other.
- the overall fidelity score may be calculated using (i) the calculated ligation fidelity score for each of the individual overhangs and (ii) the yield that each of the individual overhangs ligates to a perfectly complementary overhang.
- the process may be repeated for another set of overhangs to calculate a plurality of overall fidelity scores, each for a different set of overhangs.
- the sets of overhangs selected in the iterated steps are different from one another (and different from the first set of overhangs).
- the selection may be random or in a defined order. In some embodiments, these steps may be iterated using a Monte Carlo simulation. In this method, at least 100, at least 1,000 or at least 10,000 overall fidelity scores may be generated, each for a different set of overhangs. This part of the method repeated until an overall fidelity score has been assigned to all possible combinations of overhangs or until one or more overhangs have been identified that overall fidelity score that is above a threshold.
- the method may comprise identifying the set of overhangs that has a suitable overall fidelity score (Examples of sets of overhangs are provided in Table 7 and Example 5).
- the identified set of overhangs may have an overall fidelity score that is in the top 50%, top 20%, top 10% or top 5% of overall fidelity scores.
- the identified set of overhangs may have the highest overall fidelity score or a score that is in the top 10% or top 5% highest fidelity scores.
- the selected set of overhangs may be output from the computer onto, e.g., a display (see Example 5 and FIGs. 9B-9D).
- the method may comprise a user inputting into an interface, one or more of the following: (i) the desired number of overhangs for an assembly reaction; (ii) the length of the overhangs; optionally, (iii) the nucleotide sequence of the assembly; (iv) the set of intervals in which the nucleotide sequence of (iii) can be cleaved, causing the method to be executed and then receiving as output, an identified set of overhangs and/or receiving a set of fragments for the assembly, where the ends of the fragments are defined by the identified set of overhangs, depending on which information is input into the interface.
- This method may further include receiving instructions regarding the ligation conditions for ligating the set of overhangs or fragments containing the same, and, optionally, thermocycling conditions for producing the fragments and ligating them together.
- the method may comprise making a set of double stranded nucleic acids that have a set of overhangs that has an overall ligation score that is at or above a threshold, and their complements, and then ligating the fragments together in a single reaction to produce an assembly, wherein in the reaction the overhangs determine the order of the fragments in the assembly.
- the ligating may be done by overhang-directed ligation, which will be explained in greater detail above and/or below.
- the method may further comprise receiving selected experimental conditions for ligation.
- Ligation Fidelity Viewer GetSet and SplitSet that have been described in detail in WO 2020/081768.
- the user interface for each of these applications is provided in FIGs 9A-9D.
- the ligase data provided in the examples is an additional feature of the experimental conditions as discussed above that enables refinement of the optimized set of overhangs. This is particularly useful for large sets of overhangs with corresponding large sets of fragments for ordered assembly.
- GetSet (see FIG. 9C) the overhang length is selected, the total number of overhangs is entered, those overhangs that are required are entered and excluded overhangs can also be added and experimental conditions can be selected including the use of PaqCI and a selection of ligases. GetSet will then provide a set of overhangs best suited for the ligation conditions specified.
- a first step may include receiving a nucleotide sequence of an intended assembly and a set of intervals (e.g., at least 5, at least 10, at least 20 or at least 30, up to 50 or more intervals) in which the nucleotide sequence can be cleaved (in addition to the desired number of overhangs for an assembly reaction and the length of the overhangs).
- the input sequence may be, for example, any sequence that is at least 500 bases in length, although sequences as short as 25 nucleotides could be selected providing a Type IIS restriction endonuclease recognition sequence is present at the beginning and end of that interval.
- the method may include receiving a sequence and multiple sets of beginning and end coordinates, where each set of beginning and end coordinates defines an interval in which the sequence can be cleaved.
- each set of beginning and end coordinates defines an interval in which the sequence can be cleaved.
- only overhangs that are in the intervals may be selected from the overhang table such that, together, each interval is represented by a selected overhang.
- a non-redundant set of sub-sequences are then identified in the intervals that are the same length as the received overhang length. These sub-sequences may be stored as the overhang table itself or only sequences from the non-redundant set of sub-sequences will be selected from an overhang table (see Tables 8 and 9).
- the intervals may be input into the computer by a user, e.g., by inputting the intervals into an interface (see FIG 9D).
- a user may input a sequence and specify how many fragments are desired.
- an algorithm may determine approximate positions at which the input sequence may be split to produce the desired number of fragments, and then identify intervals (which may be, e.g., 10-50 or 10-100 nucleotides in length) that contain the approximate positions.
- the intervals may be processed as described above.
- the method may further comprise splitting the nucleotide sequence of the assembly at the identified overhangs, thereby producing a set of fragments of the assembly, where the ends of the fragments are defined by the identified overhangs.
- the SplitSet interface is shown in FIG. 9D where the desired overhang length is provided by selecting an item in the menu. Ligations conditions are then selected just as with the Ligation Fidelity Viewer, the nucleotide sequence is inputted, the number of fragments is entered. The computer will then provide the results for the optimized set of fragments for ordered assembly.
- Embodiments are provided herein for enabling a user of a computer to review by means of a graphical representation, the ligation fidelity profile expected from a predetermined set of fragment overhangs under selected experimental conditions.
- Each of these features can be modified by adjusting any of the parameters described herein to provide a revised graphical representation and to determine whether the change improved the ligation fidelity profile for the selected number of overhang sequences using the graphical representation of the deviation from perfect score obtained for the set of overhang sequences.
- a first database may be the product of analysis of annealed overhangs where an example of an assay is provided in FIG. 2A-2C and Example 1.
- a second database may be derived using the same assay to provide data on frequency and fidelity of ligation by different ligases that recognize different 4-base overhangs and have different or similar biases.
- the complete set of overhangs may include overhangs of different sizes.
- the nucleic acids include DNA, RNA or DNA/RNA hybrids or chimera. While DNA may be specifically mentioned in the description, examples and claims for convenience, embodiments herein are not limited to DNA but may be applied to any type of nucleic acid as described above.
- Factors for determining an appropriate length of overhangs include: how many fragments are desired to be joined where the longer the overhang, the larger the set of possible combinations. This enables more fragments, each with a unique overhang complementary to its adjacent fragment overhang, to be joined to form a target polynucleotide. Other factors include the efficiency of melting/annealing where shorter overhangs melt and anneal faster and longer overhangs require higher melting temperatures. Ligation efficiency is another factor where longer overhangs may ligate more efficiently than shorter overhangs. Ligation efficiency also depends on the characteristics of the nucleotides singly or together in the overhang where some sequences are more efficiently hybridized and/or ligated to form a junction than others, have reduced bias and do not favor or induce mismatches.
- the output from the system instructs the user which restriction endonucleases should be used to cleave the nucleic acid to generate overhangs having sequences that have been optimized for ligation fidelity or selected for a chosen ligation fidelity.
- restriction endonucleases should be used to cleave the nucleic acid to generate overhangs having sequences that have been optimized for ligation fidelity or selected for a chosen ligation fidelity.
- cleavage enzyme systems can be used such as uracil-specific excision reagent (USER®, New England Biolabs, Ipswich, MA), argonautes, clustered regularly interspaced short palindromic repeats (CRISPR) or other cleavage enzymes can be used to generate overhangs.
- CRISPR clustered regularly interspaced short palindromic repeats
- the experimental conditions discussed above are offered by menu from the computer interface to the user and then selected by the user or selected by the computer that has computed all the various parameters for the assembly and provides the best conditions for efficient joining all the fragments in a set correctly.
- the use of a Type IIS restriction enzyme enables the precise selection of a site where the DNA will be broken and enables exclusion of the restriction enzyme recognition sequence from the final construct (thus enabling seamless one-tube assembly reactions) or certain types of nucleic acid assembly, for example for gene coding regions, scarless junctions which do not alter the DNA sequence are important. In other applications, for example, cistron formation, additional or altered nucleotides that may remain from an assembly reaction may not interfere with the gene expression of the target nucleic acid.
- the endonucleases suitable for use in generating overhangs and scarless junctions include:
- 2-base overhang generators e.g. Btsl and isoschizomers thereof, Acul and isoschizomers thereof
- 3-base overhang generators e.g., Sapl and isoschizomers thereof and BspQI and isoschizomers thereof (both 7-base recognition)
- 4-base overhang generators e.g., Bsal-HFv2 and isoschizomers thereof (6-base recognition), Bbsl and isoschizomers thereof (6-base recognition), BsmBI and isoschizomers thereof (6-base recognition), PaqCI (7-base recognition) and
- 5-base overhang generators e.g., Hgal and isoschizomers thereof with a 5-base recognition site.
- restriction endonucleases as described in the New England Biolabs 2017/2018 catalog and isoschizomers thereof may be used forthose assembly reactions that are not required to be scarless.
- 2-base overhangs generate a 16x16 matrix data table
- 3-base overhangs generate 64x64 matrix data table
- 4-base overhangs generate 256x256 matrix data table
- 5-base overhangs generate a 1024x1024 matrix data table
- 6-base overhangs generate a 4096 x 4096 matrix data table.
- the upper limit of overhang length using a Type IIS restriction endonuclease may be 5, 6, 7 or 8-bases in length.
- the number of bases in an overhang may be as much as the user desires based on the positioning of a uracil.
- the optimized sets of Watson crick pair overhangs include overhang pairs that can ligate with their exact complementary partner efficiently, are not palindromes, and are unique within the set. Other overhang pairs are acceptable as long as preferably no individual overhang forms a ligation product with an overhang partner containing one or more mismatches but preferably no more than one mismatch.
- the highest fidelity set of overhangs with good ligation fidelity can be provided by the computer for any chosen number of junctions (such as 10 junctions, 12 junctions, 15 junctions, 20 junctions etc.). The greater the number of junctions the lower the mean maximal ligation fidelity for the set of overhang pairs.
- overhangs are created using alternate enzymes such as nicking agents for example, USER (also see for example US 7,435,572), or EndoMS suitable for creating overhangs in DNA fragments; and argonautes and Cas cleavage enzymes suitable for overhangs in DNA and RNA, where these enzymes utilize guide DNAs or RNAs.
- alternate enzymes such as nicking agents for example, USER (also see for example US 7,435,572), or EndoMS suitable for creating overhangs in DNA fragments; and argonautes and Cas cleavage enzymes suitable for overhangs in DNA and RNA, where these enzymes utilize guide DNAs or RNAs.
- Embodiments of the methods permits the user to receive a computational output that provides optimized sets of overhangs based on a measure of the net effect of cutting, melting, annealing, and ligation for a particular combination of cleaving enzyme and one or more ligases under a given set of cycling conditions where some or all of these features are provided by the user.
- the output can then provide a relative ligation efficiency and/or ligation fidelity value for every overhang pairing.
- the computational output may additionally provide for the user an optimized protocol for performing an assembly to obtain a desired overall ligation fidelity detailing at least one of temperature, time for hybridization, cycling conditions for ligation, and buffer.
- the computational output may include a graphical output of features that include one or more of the following: (1) the entire assembled sequence with the junction sites highlighted; (2) a map of input fragments with individual cut sites indicated on the fragment where the set of cut sites have been determined computationally to yield the optimal set of overhangs for fragment assembly to form the desired product; (3) a matrix of ligation fidelity of the selected overhangs under the user specified conditions or the computer optimized experimental conditions; and (4) a set of primer sequencers that contain selected Type IIS restriction endonuclease recognition sequences and overhang sequences plus any additional target fragment sequences for directing automated oligonucleotide synthesis.
- the set of primer sequences can be forwarded electronically to a receiving location for instructing a DNA synthesis instrument to make such primers.
- the results for a user's chosen set of overhangs can be optimized by the user providing the preferred set of conditions to achieve efficient and accurate hybridization.
- Short linkers of arbitrary sequences are preferred for large numbers of fragments (e.g., >20).
- Multiple data sets can be accessed that provide overhang optima under different conditions.
- Such assays enable the user to select a set of enzymes and reaction conditions that would give the highest possible fidelity and efficiency for a selected set of overhangs.
- partial overhang pair reaction parameters and data sets could be selected by the user and partial overhang reaction parameters and optionally data sets could be selected by the computer to provide the optimal ligation efficiency and fidelity possible to create the desired number of ligated fragments.
- 15 junction pairs might be required in total to join 16 fragments of double stranded nucleic acid fragments where 6 overhang pairs had been selected by the user and the remainder of overhangs are provided in a computer-generated output optionally with preferred experimental conditions including choice of ligase.
- the user could then be enabled to receive an additional optimized 9 overhang pairs with optional choice of reaction components such as restriction enzyme, ligase and optional choice of other reaction conditions including cycling time and temperature that would provide the highest ligation fidelity and efficiency possible for the 15-member final set.
- the user inputs into the computer, a gene, gene pathway, plasmid or chromosome sequence for dividing into fragments suitable for efficient assemble with high fidelity using an optimized set of overhangs.
- the user may specify the target nucleic acid and the desired number of fragments.
- the webtool or graphical interface provides the sequence for the desired number of fragments at the optimal junctions that satisfy the hybridization parameters of the associated overhangs that when ligated, form scarless junctions thus enabling to the user to make the target polynucleotide in the desired manner. If the user additional specifies the minimum acceptable fidelity, the sequence specification for the desired number of fragments may be altered and indeed the number of fragments provided to the user might change to provide the maximum number of sequences possible with junctions that provide the specified minimum acceptable fidelity.
- the user may provide the target sequence and additionally may specify some junctions to be included in the design of constituent fragments with predetermined overhangs, and some subset of reaction conditions (or all reaction conditions).
- the computer provides to the user, a list of overhangs for efficient ligation to supply the best additional junctions and/or reaction conditions.
- the assembly proceeds at either a single temperature suitable for all types of enzyme activities used in a reaction (e.g., cleavage enzymes and ligation enzymes) or any number of cycling conditions varying between an optimal cutting/melting temperature and an optimal annealing/ligation temperature.
- cleavage enzymes and ligation enzymes any number of cycling conditions varying between an optimal cutting/melting temperature and an optimal annealing/ligation temperature.
- overhangs are generated and sealed in one pot, and multi-fragments can be joined together in one experiment.
- Implementation of nucleic acid assembly using a computer program and a general-purpose computer system The various components of the various systems described herein may be implemented as a computer program using general-purpose computer systems.
- Such a computer system typically includes a main unit connected to both an output device that displays information to a user and an input device that receives input from a user.
- the main unit generally includes a processor connected to a memory system via an interconnection mechanism.
- the input device and output device also are
- Example output devices include, but are not limited to, liquid crystal displays (LCD), plasma displays, cathode ray tubes, video projection systems and other video output devices, printers, devices for communicating over a low or high bandwidth network, including network interface devices, cable modems, and storage devices such as disk or tape.
- One or more input devices may be connected to the computer system.
- Example input devices include, but are not limited to, a keyboard, keypad, track ball, mouse, pen and tablet, touchscreen, camera, communication device, and data input devices. The invention is not limited to the particular input or output devices used in combination with the computer system or to those described herein.
- the computer system may be a general-purpose computer system, which is programmable using a computer programming language, a scripting language or even assembly language.
- the computer system may also be specially programmed, special purpose hardware.
- the processor is typically a commercially available processor.
- the general-purpose computer also typically has an operating system, which controls the execution of other computer programs and provides scheduling, debugging, input/output control, accounting, compilation, storage assignment, data management and memory management, and communication control and related services.
- the computer system may be connected to a local network and/or to a wide area network, such as the Internet. The connected network may transfer to and from the computer system program instructions for execution on the computer, media data such as video data, still image data, or audio data, metadata, review and approval information for a media composition, media annotations, and other data.
- a memory system typically includes a computer readable medium.
- the medium may be volatile or nonvolatile, writeable or non-writeable, and/or rewriteable or not rewriteable.
- a memory system typically stores data in binary form. Such data may define an application program to be executed by the microprocessor, or information stored on the disk to be processed by the application program.
- the invention is not limited to a particular memory system.
- Time-based media may be stored on and input from magnetic, optical, or solid-state drives, which may include an array of local or network attached disks.
- System such as those described herein may be implemented in software, hardware, firmware, or a combination of the three.
- the various elements of the systems, either individually or in combination may be implemented as one or more computer program products in which computer program instructions are stored on a computer readable medium for execution by a computer or transferred to a computer system via a connected local area or wide area network.
- Various steps of a process may be performed by a computer executing such computer program instructions.
- the computer system may be a multiprocessor computer system or may include multiple computers connected over a computer network.
- the components described herein may be separate modules of a computer program, or may be separate computer programs, which may be operable on separate computers.
- the data produced by these components may be stored in a memory storage system or transmitted between computer systems by means of various communication media such as carrier signals.
- compositions and kits may be used in a number of diagnostic and medical contexts. Some examples are given below.
- Example 5 describes the use of multi-fragment assembly methods for component sequences of Coronaviruses that can be engineered into novel virion sequences transcription into RNA and testing as potential substrates for vaccine development.
- FIG. 14 and FIG. 15 show the improved multi-fragment assembly described herein for CarT-cell therapy and for designing phage antibiotics. These methods rely on making large arrays of 25-50 fragments in a one pot reaction. These arrays can then be used to target multiple genes in a single one pot reaction. Alternatively, a single gene could be targeted multiple times and/or in multiple locations to enhance the efficiency of editing via homology directed repair. Another application of arrays is to create or alter gene pathways for, for example, metabolic pathways.
- Biosensor arrays may be designed (using for example Cas 13), to sense a wide range of nucleic acids at once, for example in a multi-pathogen sensor system).
- Example 1 Differences in frequency of ligation for different ligases caused by different 4-base sequences All enzymes (excepting hLig3) and buffers were obtained from New England Biolabs (NEB, Ipswich, MA).
- T4 DNA ligase reaction buffer is: 50 mM Tris-HCI (pH 7.5), 10 mM MgCI 2 , 1 mM ATP, 10 mM DTT.
- NEBNext® Quick Ligation reaction buffer is: 66 mM Tris pH 7.6 @ 25°C, 10 mM MgCh, 1 mM DTT, 1 mM ATP, 6% Polyethylene glycol (PEG 6000).
- NEBuffer 2 (IX) is: 10 mM Tris-HCI (pH 7.9), 50 mM NaCI, 10 mM MgClj, 1 mM DTT.
- CutSmart® Buffer (IX) is: 20 mM Tris-acetate (pH 7.9), 50 mM Potassium Acetate, 10 mM Magnesium Acetate, 100 pg/ml BSA.
- ThermoPol® buffer is: 20 mM Tris-HCI (pH 8.8), 10 mM (NH 4 )2SO4, 10 mM KCI, 2 mM MgSO 4 , 0.1% Triton-X-100.
- Standard Taq polymerase buffer is: 10 mM Tris-HCI (pH 8.3), 50 mM KCI, 1.5 mM MgCI 2 .
- the hLig3 beta gene was synthesized by Biomatik (Ontario, Canada) and subcloned into a pET28 plasmid in frame with an N-terminal Hiss-tag.
- the construct was expressed in T7 Express lysY/l Q E. coli cells (New England Biolabs, Ipswich, MA).
- initial PAGE-purified substrate precursor oligonucleotide contained a 5'-terminal region, a randomized four-base region, a Bsal-HFv2 binding site, a constant region, an internal 6-base randomized region as a control for synthesis bias, and a region corresponding to the SMRT-bell sequencing adapter for Pacific Biosciences SMRT sequencing.
- the precursor oligonucleotide was extended as described previously and purified using the Monarch PCR & DNA Cleanup Kit. The extended DNA was cut using Bsal-HFv2 to generate a four-base overhang.
- substrate 100 nM
- DNA ligase either T4 DNA ligase, T3 DNA ligase, T7 DNA ligase, PBCV-l DNA ligase, or hLig3 at 1.75 pM final concentration
- lx T4 DNA ligase buffer or NEBNext’ Quick Ligation reaction buffer for reactions noted as containing PEG
- Reactions were quenched with 2.5 pL of ligase reaction quench (500 mM EDTA + 2.5% v/v Proteinase K) and the sample was heated to 37°C for 30 minutes to allow for ligase cleavage by Proteinase K DNA.
- the reaction was then purified using the Monarch PCR & DNA Cleanup Kit and following the Oligonucleotide Cleanup protocol. Each ligation was performed in a minimum of duplicates, and the ligation yield was determined by Agilent Bioanalyzer (DNA 1000) with error reported as one standard deviation.
- the ligated library was treated with Exonuclease III (50 U) and Exonuclease VII (5 U) in a 50 pL volume in IX Standard Taq Polymerase buffer incubated for 1 hour at 37°C.
- the library was purified using a Monarch PCR & DNA Cleanup Kit, oligonucleotide cleanup protocol, including a second wash step, and then quantified by Agilent Bioanalyzer (DNA 1000). Typical concentrations of final library were between 0.5 and 2 ng/pL. Two replicate experiments were conducted for each ligase. Sequencing and analysis of sequencing data were performed as previously described in WO 2020/081768 and Potapov et al (2016) Nucleic Acids Research, 46, e79-e79.
- sequencing libraries were prepared by mixing each DNA ligase (T4 DNA Ligase, T3 DNA Ligase, T7 DNA Ligase, PBCV-1 DNA Ligase, and hLig3) with a DNA hairpin substrate containing degenerate 5'-four-base overhang ends, allowing for every possible sequence context to be observed in a single reaction for each ligase (Potapov, et al. (2016) Nucleic Acids Research, 46, e79-e79). The ligase was present in a large excess compared to the DNA substrate to permit rapid ligation of short cohesive ends.
- the libraries were sequenced using PacBio SMRT sequencing and a summary of multiplex ligation data for each ligase, including the total number of ligation events, percentage of correct (Watson-Crick) vs incorrect (mismatch) ligations, and yield of ligation product obtained is provided in Table 2.
- the multiplex ligation data revealed ligation sequence bias in the preferred overhang sequences.
- the number of reads for each overhang was a proxy for its ligation efficiency; the sequence bias for each ligase was inferred from the relative frequency of each overhang appearing across all ligation products. Varying overall degrees of bias, as well as intrinsically different preferred sequences were detected between ligases (FIG. 4A). T7 DNA ligase showed the highest degree of sequence bias. All other ligases examined had a much tighter distribution of ligation frequencies, but with differences in how tightly the data points are clustered around the average.
- T4 DNA ligase and hLig3 showed the least amount of bias with the range of values more than two-fold smaller compared to T7 DNA ligase.
- PBCV-l and T3 had a similar average ligation frequency but a slightly larger range of observed ligation frequencies.
- T4 DNA ligase displayed moderate fidelity (72% correct ligation products).
- T4 DNA ligase, T3 DNA ligase, PBCV-l ligase and hLig3 had a broad range of fidelity for individual overhang sequences, with some overhangs having very few mismatch ligation events and others with frequent mismatch ligations (FIG. 4b). For many overhangs, even when presented with all possible partners, ligation products were almost exclusively with the Watson Crick partner.
- T4 DNA ligase with an overall fidelity of 72% had a median fidelity of 90% for overhangs with 0% GC content and decreases in average fidelity with each incremental increase in GC content, ultimately falling to 52% fidelity for overhangs with 100% GC content (FIG. 5A).
- FOG. 5A GC content
- Mismatch ligation at the edge position (Nl) of the 4-base overhang were dominated by G:T and T:G mismatches, accounting for 65% of all mismatch ligations at the edge.
- the presence of a mismatch at middle positions (N2 and N3) of the overhang were less tolerated by T4 DNA ligase but were still dominated by G:T mismatches.
- hl_ig3 showed a broad range of ligation fidelity. Most overhangs ligated with ⁇ 50% fidelity, and several overhangs (TAAG, AATA, TTAC, CCAA) ligated with >80% fidelity. The influence of GC content was weaker for hLig3, which had an average fidelity of 72% on overhangs with 0% GC content and an average fidelity of 32% for overhangs with 100% GC content (FIG. 5C). More than half of ligation products (56%) contain mismatch base pairs. hLig3 has a significant accumulation of mismatch products with more than a single base pair mismatch, and 8% of ligation products contain 2 mismatches.
- the large majority (97%) involved at least one mismatch in the edge position and typically include at least one G:T mismatch.
- G:T and T:G mismatches were well tolerated, hLig3, T3 DNA ligase, and PBCV-l ligase are also more permissive of purine:purine mismatches at both the edge and middle positions, with G:A and G:G mismatches ligated almost as frequently as G:T mismatches.
- T7 DNA ligase had a tighter range of ligation fidelity, with only a handful of overhangs that ligated with less than 80% fidelity. T7 DNA ligase showed over 86% average fidelity regardless of GC content. T7 DNA ligase has an overall lower tolerance for mismatch ligation, and only 12% of ligation products contain a mismatch. Similar to T4 DNA ligase, single base pair mismatches account for nearly all (98%) T7 DNA ligase mismatch ligation products and the predominate mismatches are G:T and T:G at the edge position and G:T in the middle position of the 4-base sequence.
- the computer design tool for determining overhangs to optimize ligation fidelity in FIG. 9A has three components-the Ligase fidelity viewer (see FIG. 9B), the GetSet viewer (see FIG. 9C) and the SplitSet interfaces (see FIG. 9D) that together form the ligation fidelity tool (see for example WO 2020/081768). All three computer design tools have relied on a single ligase (T4 DNA ligase).
- T4 DNA ligase T4 DNA ligase
- the data obtained here add to these three tools by providing a choice of preferences under the menu of ligation conditions.
- a ligase can be selected having different base sequence preferences that affect the choice of overhangs. The benefit of this additional data will improve the accuracy of the tools for ordered assembly of multi-fragments.
- the data is obtained from 4-base overhangs but can be readily repeated for 2-base, 3-base and 5-base overhangs
- the data also provides the user with a refined estimate of assembly fidelity for a given set of user- supplied overhangs and identifies problematic overhang pairings with a high potential for mismatch ligation if this is undesirable.
- the GetSet tool allows users to generate overhang sets with maximum assembly fidelity using automated overhang selection.
- GetSet returns a high-fidelity overhang set matching input criteria of number of overhangs, length of overhangs and ligation conditions. Users can specify overhang sequences that must be included or excluded from the results.
- GetSet does not use pre-calculated results and instead identifies de novo high-fidelity overhang sets using a stochastic search algorithm. Consequently, the stochastic search algorithm may return different recommended overhang sets from the same input criteria, meaning repeating a search can result in different junctions with similar predicted fidelities. We have therefore included a feature to save and recall prior GetSet search results.
- the GetSet tool was used to expand a standard overhang set used in plant synthetic biology; the set size could be increased from 11 overhangs to 20 overhangs with only marginal decrease in the predicted assembly fidelity from 81% to 80%.
- the SplitSet tool designs high-fidelity assembly fragments from a desired target DNA sequence.
- users input a DNA sequence, the desired number of fragments, ligation conditions and approximate search windows for fusion sites (by default, the program chooses equally spaced search intervals).
- the SplitSet tool divides the input DNA sequence at the highest fidelity set of junctions within the parameters chosen.
- users can exclude specific fusion site sequences to ensure compatibility with pre-existing modular cloning systems or include fixed sites by setting a narrow search window to cover which site or sites must be used.
- Additional features include, checking the fragments for the presence of any internal sites that might affect the choice of Type IIS restriction enzyme to direct an assembly, or alert the user to remove such internal sites via domestication.
- the program can also automatically generate a set of primers for the DNA fragments to add the flanking bases and recognition sites required either for amplicon generation of inserts to be directly used or for pre-cloning purposes.
- a report can be generated describing the full assembly with a color- coded graphical read out, your final assembly sequence, and descriptions of each junction between inserts.
- Example 3 Aprataxin/5'deadenylase and PEG in Golden Gate assembly
- the correctly assembled target polynucleotides coded for a cassette of the lac operon (about 5 kb) so that blue colonies of correctly assembled fragments could be distinguished from white colonies containing incorrect assemblies on IPTG/Xgal/Chloramphenicol plates.
- Example 4 Testing the effect of reaction temperature on multi-fragment assembly fidelity and assembly of the lac operon cassette from 52 fragments
- Multi-fragment assembly that relies on a two enzyme mix (restriction endonuclease and ligase) typically utilizes two step cycling protocols, alternating between a 16°C incubation step to maximize DNA ligation efficiency and a 37-42°C incubation step to maximize fragment digestion efficiency.
- the omission of 16°C incubation was tested to determine the effect on multi-fragment assembly fidelity, as higher reaction temperatures have been shown to improve DNA ligase fidelity.
- the frequency of multi-fragment assembly errors at 37°C or 42°C was quantified in a multiplex high throughput DNA sequencing assay, and the results compared to reactions using traditional thermocycling protocols of 37/16°C or 42/16°C.
- Mismatch frequencies for assembly reactions were grouped according to nucleotide mispair (A:A, A:C, A:G, C:C, C:T, G:G, G:T, T:T). Assembly reactions were carried out with T4 DNA ligase and either Bsal-HFv2 at 37°C or BsmBI-v2 at 42°C. For comparison, mismatch frequencies are shown for assembly reactions using traditional thermocycling protocols with T4 DNA ligase and either Bsal-HFv2 at 37°C and 16°C or BsmBI-v2 at 42°C and 16°C. Mismatch frequency was significantly lower using Bsal-HF-v2 (37°C) or BsmBI-v2 (42°C) at a single temperature than observed for cycling.
- a 4.9 kb cassette of the lac operon was cloned into an E. coli destination vector from 52 constituent parts in a single assembly round.
- the lac operon cassette system used here mimics a traditional cloning reactions wherein, upon transformation of the assembly reaction into E.coli cells, colonies harboring correctly or incorrectly assembled constructs can be readily observed.
- This test system provides a colorimetric readout to differentiate transformants harboring correctly and incorrectly assembled products.
- Plasmid DNA was isolated from 18 blue colonies using the Monarch Plasmid Miniprep kit (New England Biolabs, Ipswich, MA). Twelve of the resulting constructs were subjected to PCR with amplification primers that flank the desired insertion site. Every construct yielded an amplicon size consistent with assembly of all 52 fragments, demonstrating that blue colonies contained, the desired number of inserts. Six of the isolated constructs were sequenced using nine different sequencing primers to cover the entire 4.9 kb expected insert. All 6 constructs contained ordered error-free assembly of all 52-inserts.
- GGAG CCAG, ATGT, TACA, GGCA, TATC, TAAG, CAGC, GAAC, CAAC, GCTT, TAGT, CTAT, GGAA, TTCG, AGAC,
- GTAT GTAT, GCGT, GATT, TTAC, TATT, TCGT, CAGA, GGGA, CTCA, GCAA, TGGA, CGTC, AACC, AGTA, TAGA, GAAA,
- constructs were purified from a subset of colonies and the inserts analyzed by PCR and Sanger sequencing; all colonies subjected to additional screening were found to harbor constructs with inserts of the anticipated size and sequence.
- Plasmid DNA was isolated from 18 blue colonies using the Monarch Plasmid Miniprep Kit (New England Biolabs, Ipswich, MA). Twelve of the resulting constructs were subjected to PCR with amplification primers that flank the desired insertion site. Every construct yielded an amplicon size consistent with assembly of all 52 fragments, demonstrating that blue colonies contained the desired number of inserts. Six of the isolated constructs were sequenced using nine different sequencing primers to cover the entire 4.9 kb expected insert. All 6 constructs contained ordered error-free assembly of all 52-inserts.
- the one step assembly of phage T7 DNA and the lac operon cassette demonstrate an efficient and cost-effective means to create and engineer variants of large/complex DNA constructs that are difficult to obtain and manipulate by current cloning and gene synthesis methodologies.
- Multi-fragment assembly is shown here for rapid assembly of toxic and/or high molecular weight DNA constructs from dozens of smaller constituent parts that are easily manipulated and propagated using standard molecular biology techniques.
- Example 5 Rapid one-pot DNA molecule construction from 50 fragments of 40 Kb T7 phage DNA Enzymes, buffers, and media were obtained from New England Biolabs, Ipswich, MA (NEB, Ipswich, MA), unless otherwise noted. Synthetic oligonucleotides were obtained from either Integrated DNA Technologies (IDT, Coralville, IA) or Sigma Aldrich (Sigma, St. Louis, MO). As the phage genome contains many genes that are toxic to E. coli cells, the phage gDNA was reconstructed from PCR-generated DNA fragments to avoid subcloning toxic genes. Using this strategy, 16 silent mutations were introduced into the phage genome to remove pre-existing BsmBI Type IIS restriction sites within the genome. These changes served the dual purpose of both permitting Type IIS assembly and acting as marker mutations for assembly verification.
- AAAT AGAA, AGCG, ATGT, TAGT, TCGC, CTGG, ACAA, AGAC, GCTG, GGCA, ACCC, ACCG, AAGC, TACT, AATC, AAGG, GAAA, GGTT, CAAC, CGTC, CCTA, TGGG, TAAG, TCAT, ACGG, GTAA, CATT, TATC, TGAG, GCAC, CCAC, TTCG, TCTG, AGGA, ACGC, TGGC, GTAT, CGTG, CTAT, GAGA, ACTC, GGTG, TCCA, GGGA, GTTC, TTGC, GAAG, GGAA, CAAA, ATCA, TGTT
- Assembly fragments were generated by PCR (Q5® Hot-Start High-Fidelity 2X Master Mix (New England Biolabs, Ipswich, MA)) with oligonucleotide primers (IDT) and purified using the Monarch PCR & DNA Cleanup Kit. Fragment quality was evaluated using the Agilent Bioanalyzer 2100 and each assembly part was quantified using the Qubit Assay (Thermo Fisher Scientific, Waltham, MA).
- Multi-fragment assembly reactions (5 pL final volume) were carried out with 3 nM of each DNA fragment and 0.5 pL of the appropriate multi-fragment assembly mix (NEB® Golden Gate Assembly Mix (New England Biolabs, Ipswich, MA) in IX T4 DNA ligase buffer; the BsmBI-v2 mix was used to assemble the T7 phage genome.
- Reactions to produce the T7 bacteriophage genome were cycled between 42°C and 16°C for 5 minutes at each temperature for 96 cycles, and then subjected to a 60°C incubation for 5 minutes and finally a 4°C hold until transformation into E.coli
- the assembled T7 phage genome was transformed into NEB 10-beta electrocompetent cells as per the manufacturer's instructions, using 1 pL of the reaction mixture into 25 pL of competent cells.
- the transfection mixture was recovered in 975 pL of NEB 10-beta/stable outgrowth media and then combined with 3 mL of 50°C molten top-agar (Luria broth containing 0.7% agar).
- the resulting plates were inverted and incubated at 37°C for ⁇ 5 hours until the E. coli lawn and phage plaques were visible by eye.
- about 20 bacteriophage plaques /ul assembly reaction were obtained indicating successful assembly of the phage genome.
- phage plaques were selected for additional screening by plaque PCR and restriction enzyme digest to ensure they contained a complete and correctly ordered copy of the T7 phage genome; all plaques subjected to additional screening contained the expected genome arrangement and harbored the intended silent mutations.
- Plaque PCR was carried out using 4 sets of amplification primers that together span the 40 kb phage genome. Amplicon lengths were resolved by Agilient Bioanalyzer 2100, using a DNA 12000 assay. Amplicons from 5 phage plaques were compared to the parental wt T7 phage genome after restriction enzyme digest with Ndel or undigested . In all cases, the phage plaques produced a pattern identical to the parental wt T7 gDNA.
- Table 7 Examples of computer-generated optimized overhang sets according to the methods herein.
- GGAG GATA, GGCA, GGTC, TCGC, GAGG, CAGT, GTAA, TCCA, CACA, GAAT, ATAG, AGTA, ATCA, TCTT, AGGT, CAAA, AAGC, GCAC, CAAC, CGAA, GTCT, TCAG, CCAT
- GGAG GATA, GGCA, GGTC, TCGC, GAGG, CAGT, GTAA, TCCA, CACA, GAAT, ATAG, AGTA, ATCA, TCTT, AGGT,
- TTGC TGGA, TGAG, TAGG, ACAG, AAGC, AGCC, GTCA, CGTT, ATTT, TTCT, GAAA, GATG, GTAT, GCAC, TCGT,
- GGTC CGGG
- CACT ACTA
- ACCT ACCT
- TCTC ATGG
- GTAG GTAG
- AAAC AACA, AAGA, AAGT, AATG, ACAC, ACGA, AGAA, AGCC, AGGG, AGTA, ATAG, ATCA, ATGA, ATTC, CAAA, CACG, CAGA, CCAG, CCTA, CGAA, CGGC, CTCC, CTTA, GAGC, GATA, GCAA, GGGA, GTAA, TCCA
- AAAT AGAA, AGCG, ATGT, TAGT, TCGC, CTGG, ACAA, AGAC, GCTG, GGCA, ACCC, ACCG, AAGC, TACT, AATC, AAGG, GAAA, GGTT, CAAC, CGTC, CCTA, TGGG, TAAG, TCAT, ACGG, GTAA, CATT, TATC, TGAG, GCAC, CCAC, TTCG, TCTG, AGGA, ACGC, TGGC, GTAT, CGTG, CTAT, GAGA, ACTC, GGTG, TCCA, GGGA, GTTC, TTGC, GAAG, GGAA, CAAA, ATCA, TGTT
- Example 6 Synthesis and Engineering of a Viral Genome for Research and Vaccine Development e.g. Coronavirus
- fragments may be contained in plasmids having a recognition sequence for a selected restriction endonuclease at the insertion sites.
- the restriction endonuclease may be selected from: Bsal-HFv2, BsmBI-v2, Bbsl-HF, Sapl, BspQI and PaqCI. If PaqCI is selected for viral assembly, an activator molecule as described above should be included in the reaction mix.
- T4 DNA ligase buffer (10X) 0.5-2 ul of PaqCI (lOu/ul), PaqCI activator (20uM) 0.25-0.5 ul
- T4 DNA ligase (400 u/u I) 0.50-2 ul and nuclease free water to 20 ul.
- the assembly protocol is (37°C, 5 minutes 16°C, 5 minutes) x 30 - 60 cycles -> 37°C, 5 minutes 60°C, 5 minutes.
- Fidelity is calculated as the fraction of correct ligations divided by the total fraction of ligations for a given overhang.
- Segment options for use in assembly of a coronavirus genome using the multi-fragment assembly method are provided.
- Table 11 Segments of a 50-Fragment viral genome (Coronavirus CV-2 genome) (Genbank ID: NC_O45512)
- Example 7 An automated workflow for generating an ordered assembly of polynucleotides into a target polynucleotide
- a workflow could be largely or entirely accomplished in a single machine with various component inputs presented together or sequentially.
- a desired sequence is entered into the computer.
- the computer then provides an output describing the suitable fragments, and overhangs derived from the ligation frequency table to which sets of rules have been attached for ordered assembly of the desired sequence.
- the computer output might interface with a lab on a chip or other instrument containing multiple reagent compartments.
- the regulation of reaction steps may be controlled on a chip by electrowetting based liquid transfer.
- AQdrop® platform (Sharp Life Sciences, Oxford, UK) enables micro-scale droplets to be electronically manipulated on the "lab-on-a-chip” device.
- Another platform is an acoustic based-liquid transfer (Beckman Coulter, Brea, CA).
- the workflow may be performed using magnetic beads to remove unwanted enzymes/primers from a reaction vessel at different stages as needed.
- fragments may be synthesized in situ or from a secondary source according to the computer output.
- the synthesized fragments can be amplified by cloning or by an amplification method such as PCR. The latter may be achieved by combining all the separate synthesized fragments in a single mixture and performing multiplex PCR.
- the polymerase may be inactivated, and a ligase and a restriction endonuclease added to achieve ordered DNA assembly using the methods described herein.
- the subsequent assembled target DNA may be: (i) incorporated into a vector that in turn is introduced into a host cell by transformation of the vector; (ii) encapsulated into a virus and introduced into a host cell by infection; (iii) in the form of naked DNA or with a chaperone molecule, introduced directly into a eukaryotic cell; (iv) introduced into an in vitro expression system to determine whether the transcript of the assembled DNA is functional.
- a product of the assembly could be moved to a platform location to perform sequencing such as by means of a whole molecule sequencer (Oxford Nanopore or Pacific Biosystems).
- Ordered assembly of DNA molecules using the methods described herein is a powerful tool for synthesizing individual genes or metabolic pathways and also for potentially modifying eukaryotic cells genetically. It also provides a means for synthesizing toxic proteins such as novel nucleases, to determine their specificity and other functions. Ordered assemblies encoding toxic proteins may be transcribed using an in vitro transcription system (New England Biolabs, Ipswich, MA) and then tested for DNA cleavage to determine whether a desired function is achieved. The selected positive proteins can then be manufactured in cells under specialized conditions.
- a first step would be to synthesize a set of fragments of at least 20 bases in length enzymatically (e.g., using a terminal transferase) or by chemical synthesis or as a product of PCR from a larger substrate or a set of overlapping fragments. These fragments can be assembled using the protocols described herein. An assembly of 50 fragments of 25-bases would generate a target polynucleotide of 1000 bases. The restriction endonuclease and ligase can optionally be heat killed at 60°C prior to the next assembly step.
- the assembly process may be repeated again with the newly created polynucleotide fragments.
- primers, aptamers and polymerases for amplifying newly formed polynucleotide fragments from the previous step can then be generated by multiplex PCR.
- the amplified polynucleotide fragments are subjected to restriction endonuclease cleavage and ligation to generate a 12,500 bp fragment from 50 x 250 bp polynucleotides or a 50,000 bp polynucleotide from 50 x 1000 bp fragments.
- the assembly can then be repeated for example by combining 50 x 12,500 (625 Kb) fragments or the 50x 50,000 bp fragments (2.5 Mb), followed by cleavage and ligation to generate a 625kb or 2.5 Mb target polynucleotide.
- Another example of a workflow would be a one-step DNA assembly using a large number of DNA fragments of a size ranging from 200-1000 bp.
- the efficiency of ligation of fragments depends on the overhangs, enzymes and experimental conditions but does not depend on the length of the polynucleotide fragments used in assembly at least up to Ikb.
- the assembly of large number of fragments used in the assembly beyond 24 fragments may be preferentially accomplished by extended incubation periods. Where these incubation periods exceed 24 hours, it may be preferable to use a static ligation protocol instead of touch-down or drop-down protocol.
- the wt T4 ligase may be used in thermocycling up to temperatures defined by drop-down conditions of 42°C/16°C. Above 42°C, a thermostable ligase is preferable.
- One tube, multiple constructs emulsified ordered assembly workflow could enable users to generate different constructs from multi-fragment in a small droplet-based format where "positive" drops can be sorted for downstream applications by FACS. Mismatch connections during ordered assembly could enable users to generate different variations of constructs in one tube by the purposeful use of an overhang(s) that pair well with multiple partners. For example, a user could generate the same genetic circuit with several different promoters in one tube and identify the best construct through genetic screening.
- DNA Origami could enable users to assemble DNA structures to facilitate transfection and consistent genetic regulation by controlling shape of assembled molecule.
- Branched Construct Generation could enable users to create futuristic constructs with branched configurations for parallel regulation. For example, use of a non-standard part assembly fragments (1 duplex to 2 duplex connectors etc.) could position to coding sequence close to the same insulator element.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Microbiology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Plant Pathology (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biochemistry (AREA)
- Analytical Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Preparation Of Compounds By Using Micro-Organisms (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Peptides Or Proteins (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020237024116A KR20230121625A (en) | 2020-12-15 | 2021-12-15 | Compositions and methods for improved in vitro assembly of polynucleotides |
CN202180092821.0A CN116848244A (en) | 2020-12-15 | 2021-12-15 | Compositions and methods for improving polynucleotide in vitro assembly |
JP2023536435A JP2024500105A (en) | 2020-12-15 | 2021-12-15 | Compositions and methods for improved in vitro assembly of polynucleotides |
EP21857017.4A EP4263827A2 (en) | 2020-12-15 | 2021-12-15 | Compositions and methods for improved in vitro assembly of polynucleotides |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202063125530P | 2020-12-15 | 2020-12-15 | |
US63/125,530 | 2020-12-15 | ||
US202163213807P | 2021-06-23 | 2021-06-23 | |
US202163213859P | 2021-06-23 | 2021-06-23 | |
US63/213,859 | 2021-06-23 | ||
US63/213,807 | 2021-06-23 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2022132198A2 true WO2022132198A2 (en) | 2022-06-23 |
WO2022132198A3 WO2022132198A3 (en) | 2022-08-18 |
Family
ID=80682842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2021/010063 WO2022132198A2 (en) | 2020-12-15 | 2021-12-15 | Compositions and methods for improved in vitro assembly of polynucleotides |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP4263827A2 (en) |
JP (1) | JP2024500105A (en) |
KR (1) | KR20230121625A (en) |
WO (1) | WO2022132198A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12188011B2 (en) | 2018-10-19 | 2025-01-07 | New England Biolabs, Inc. | Compositions and methods for improved in vitro assembly of polynucleotides |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7435572B2 (en) | 2002-04-12 | 2008-10-14 | New England Biolabs, Inc. | Methods and compositions for DNA manipulation |
WO2020081768A1 (en) | 2018-10-19 | 2020-04-23 | New England Biolabs, Inc. | Improved ordered assembly of multiple dna fragments |
WO2020181768A1 (en) | 2019-03-14 | 2020-09-17 | 南京德朔实业有限公司 | Electric screwdriver |
-
2021
- 2021-12-15 KR KR1020237024116A patent/KR20230121625A/en active Search and Examination
- 2021-12-15 EP EP21857017.4A patent/EP4263827A2/en active Pending
- 2021-12-15 JP JP2023536435A patent/JP2024500105A/en active Pending
- 2021-12-15 WO PCT/US2021/010063 patent/WO2022132198A2/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7435572B2 (en) | 2002-04-12 | 2008-10-14 | New England Biolabs, Inc. | Methods and compositions for DNA manipulation |
WO2020081768A1 (en) | 2018-10-19 | 2020-04-23 | New England Biolabs, Inc. | Improved ordered assembly of multiple dna fragments |
WO2020181768A1 (en) | 2019-03-14 | 2020-09-17 | 南京德朔实业有限公司 | Electric screwdriver |
Non-Patent Citations (25)
Title |
---|
BAUER ET AL., PLOS ONE, vol. 10, no. 12, 2017, pages e0145046 |
ELLENBERGER ET AL., ANNUAL REVIEW IN BIOCHEMISTRY, vol. 77, 2008, pages 313 - 338 |
ENGLER ET AL., METHODS MOL. BIOL., vol. 1116, 2014, pages 119 - 131 |
ENGLER ET AL., METHODS MOL. BIOL., vol. 729, 2011, pages 167 - 181 |
ENGLER ET AL., PLOS ONE, vol. 3, 2007, pages e3647 |
ENGLER ET AL., PLOSONE, vol. 4, 2009, pages e5553 |
GOFFIN ET AL., NUCLEIC ACIDS RES., vol. 15, 1987, pages 8755 - 8771 |
GRIGAITE ET AL., NUCLEIC ACID RESEARCH, vol. 30, 2002, pages e123 |
HALEMARKHAM: "THE HARPER COLLINS DICTIONARY OF BIOLOGY", 1991, HARPER PERENNIAL |
HARADA ET AL., NUCLEIC ACIDS RES., vol. 21, 1993, pages 2287 - 2291 |
LI ET AL., NAT. METHODS RES, vol. 4, 2007, pages 251 - 256 |
NILSSON ET AL., NUCLEIC ACIDS RES., vol. 10, 1982, pages 1425 - 1437 |
POTAPOV ET AL., ACS SYNTHETIC BIOLOGY, vol. 711, 2018, pages 2665 - 2675 |
POTAPOV ET AL., BIORXIV, 2018 |
POTAPOV ET AL., NUCLEIC ACID RES, vol. 46, 2018, pages e79 |
POTAPOV ET AL., NUCLEIC ACID RESEARCH, vol. 46, 2018, pages e79 |
POTAPOV ET AL., NUCLEIC ACIDS RESEARCH, vol. 46, 2018, pages e79 - e79 |
POTAPOV, V. ET AL., ACS SYNTH. BIOL., vol. 7, 2018, pages 2665 - 2674 |
PRYOR, J. M. ET AL., PLOS ONE, 2020, pages e8592 |
PRYOR, J. M., BIORXIV, 2020, pages e4019 |
SHOWALTER ET AL., CHEM REV., vol. 106, 2006, pages 340 - 360 |
SINGLETON ET AL.: "DICTIONARY OF MICROBIOLOGY AND MOLECULAR BIOLOGY", 1994, JOHN WILEY AND SONS |
TSUGE ET AL., SCIENTIFIC REPORTS, vol. 5, 2015, pages 10655 |
WU ET AL., GENE, vol. 76, 1989, pages 245 - 254 |
ZHANG ET AL., NAR, vol. 40, 2012, pages e55 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12188011B2 (en) | 2018-10-19 | 2025-01-07 | New England Biolabs, Inc. | Compositions and methods for improved in vitro assembly of polynucleotides |
Also Published As
Publication number | Publication date |
---|---|
JP2024500105A (en) | 2024-01-04 |
WO2022132198A3 (en) | 2022-08-18 |
KR20230121625A (en) | 2023-08-18 |
EP4263827A2 (en) | 2023-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11702662B2 (en) | Compositions and methods for high fidelity assembly of nucleic acids | |
EP3867373A1 (en) | Improved ordered assembly of multiple dna fragments | |
US20060127920A1 (en) | Polynucleotide synthesis | |
US20090087840A1 (en) | Combined extension and ligation for nucleic acid assembly | |
US20150203839A1 (en) | Compositions and Methods for High Fidelity Assembly of Nucleic Acids | |
RU2766717C1 (en) | Method for editing dna in an acellular system | |
AU2005295351A1 (en) | Methods for assembly of high fidelity synthetic polynucleotides | |
US11371095B2 (en) | High-throughput method for characterizing the genome-wide activity of editing nucleases in vitro (Change-Seq) | |
WO2007120624A2 (en) | Concerted nucleic acid assembly reactions | |
Struhl | Subcloning of DNA fragments | |
Wang et al. | Improved CRISPR‐Cas12a‐assisted one‐pot DNA editing method enables seamless DNA editing | |
US20240368658A1 (en) | Demand Synthesis of Polynucleotide Sequences | |
Reisinger et al. | Total synthesis of multi-kilobase DNA sequences from oligonucleotides | |
EP1817413A1 (en) | Ladder assembly and system for generating diversity | |
US12188011B2 (en) | Compositions and methods for improved in vitro assembly of polynucleotides | |
EP4263827A2 (en) | Compositions and methods for improved in vitro assembly of polynucleotides | |
Finney et al. | Molecular cloning of PCR products | |
Kalva et al. | Gibson Deletion: a novel application of isothermal in vitro recombination | |
CA3036443A1 (en) | Compositions and methods for polynucleotide assembly | |
CN116848244A (en) | Compositions and methods for improving polynucleotide in vitro assembly | |
Liu et al. | The terminal 5′ phosphate and proximate phosphorothioate promote ligation‐independent cloning | |
Vladimir et al. | Optimization of Golden Gate assembly through application of ligation sequence-dependent fidelity and bias profiling | |
US20240150753A1 (en) | Methods of isothermal complementary dna and library preparation | |
Larsen et al. | Computationally optimised DNA assembly of synthetic genes | |
Tee et al. | Back to basics: Creating genetic diversity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2023536435 Country of ref document: JP |
|
ENP | Entry into the national phase |
Ref document number: 20237024116 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180092821.0 Country of ref document: CN |
|
ENP | Entry into the national phase |
Ref document number: 2021857017 Country of ref document: EP Effective date: 20230717 |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21857017 Country of ref document: EP Kind code of ref document: A2 |