WO2016156469A1 - Cartographie d'architecture de génome sur chromatine - Google Patents
Cartographie d'architecture de génome sur chromatine Download PDFInfo
- Publication number
- WO2016156469A1 WO2016156469A1 PCT/EP2016/057025 EP2016057025W WO2016156469A1 WO 2016156469 A1 WO2016156469 A1 WO 2016156469A1 EP 2016057025 W EP2016057025 W EP 2016057025W WO 2016156469 A1 WO2016156469 A1 WO 2016156469A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- loci
- gam
- dna
- segregation
- chromatin
- Prior art date
Links
- 238000013507 mapping Methods 0.000 title claims abstract description 44
- 108010077544 Chromatin Proteins 0.000 title abstract description 131
- 210000003483 chromatin Anatomy 0.000 title abstract description 131
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 189
- 239000012634 fragment Substances 0.000 claims abstract description 112
- 238000000034 method Methods 0.000 claims abstract description 102
- 230000003993 interaction Effects 0.000 claims abstract description 90
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 71
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 70
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 70
- 238000005204 segregation Methods 0.000 claims abstract description 63
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 36
- 210000004940 nucleus Anatomy 0.000 claims abstract description 31
- 230000014509 gene expression Effects 0.000 claims abstract description 24
- 238000004132 cross linking Methods 0.000 claims abstract description 23
- 230000001105 regulatory effect Effects 0.000 claims abstract description 20
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 19
- 201000010099 disease Diseases 0.000 claims abstract description 17
- 238000007619 statistical method Methods 0.000 claims abstract description 9
- 108020004414 DNA Proteins 0.000 claims description 174
- 238000012163 sequencing technique Methods 0.000 claims description 88
- 238000001514 detection method Methods 0.000 claims description 52
- 239000003623 enhancer Substances 0.000 claims description 51
- 238000002360 preparation method Methods 0.000 claims description 43
- 210000000349 chromosome Anatomy 0.000 claims description 38
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 16
- 238000013518 transcription Methods 0.000 claims description 15
- 230000035897 transcription Effects 0.000 claims description 15
- 239000011324 bead Substances 0.000 claims description 10
- 230000000694 effects Effects 0.000 claims description 10
- 238000011282 treatment Methods 0.000 claims description 9
- 239000000203 mixture Substances 0.000 claims description 8
- 239000006285 cell suspension Substances 0.000 claims description 6
- 238000001415 gene therapy Methods 0.000 claims description 6
- 238000007481 next generation sequencing Methods 0.000 claims description 6
- 238000000926 separation method Methods 0.000 claims description 6
- 102000004190 Enzymes Human genes 0.000 claims description 5
- 108090000790 Enzymes Proteins 0.000 claims description 5
- 239000003814 drug Substances 0.000 claims description 5
- 229940079593 drug Drugs 0.000 claims description 5
- 238000010008 shearing Methods 0.000 claims description 5
- 108091023040 Transcription factor Proteins 0.000 claims description 4
- 102000040945 Transcription factor Human genes 0.000 claims description 4
- 238000000265 homogenisation Methods 0.000 claims description 4
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 3
- 238000002604 ultrasonography Methods 0.000 claims description 3
- 238000007385 chemical modification Methods 0.000 claims description 2
- 230000008711 chromosomal rearrangement Effects 0.000 claims description 2
- 230000001404 mediated effect Effects 0.000 claims description 2
- 210000003470 mitochondria Anatomy 0.000 claims description 2
- 210000001236 prokaryotic cell Anatomy 0.000 claims description 2
- 238000003260 vortexing Methods 0.000 claims description 2
- 230000002906 microbiologic effect Effects 0.000 claims 1
- 238000001556 precipitation Methods 0.000 claims 1
- 210000004027 cell Anatomy 0.000 abstract description 83
- 238000004458 analytical method Methods 0.000 abstract description 45
- 238000002487 chromatin immunoprecipitation Methods 0.000 abstract description 35
- 241000700605 Viruses Species 0.000 abstract description 5
- 210000003855 cell nucleus Anatomy 0.000 abstract description 5
- 244000005700 microbiome Species 0.000 abstract description 5
- 210000003463 organelle Anatomy 0.000 abstract description 2
- 239000000523 sample Substances 0.000 description 59
- 238000009826 distribution Methods 0.000 description 44
- 241000699666 Mus <mouse, genus> Species 0.000 description 39
- 235000018102 proteins Nutrition 0.000 description 33
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 description 30
- 238000013459 approach Methods 0.000 description 25
- 238000002474 experimental method Methods 0.000 description 18
- 230000003321 amplification Effects 0.000 description 17
- 238000003199 nucleic acid amplification method Methods 0.000 description 17
- 230000001605 fetal effect Effects 0.000 description 16
- 108091005904 Hemoglobin subunit beta Proteins 0.000 description 15
- 239000000872 buffer Substances 0.000 description 15
- 238000013467 fragmentation Methods 0.000 description 14
- 238000006062 fragmentation reaction Methods 0.000 description 14
- 210000001519 tissue Anatomy 0.000 description 14
- 108700009124 Transcription Initiation Site Proteins 0.000 description 13
- 108010067770 Endopeptidase K Proteins 0.000 description 12
- 102100021519 Hemoglobin subunit beta Human genes 0.000 description 12
- 238000011529 RT qPCR Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 12
- 238000012165 high-throughput sequencing Methods 0.000 description 12
- 239000002773 nucleotide Substances 0.000 description 12
- 125000003729 nucleotide group Chemical group 0.000 description 12
- 238000000527 sonication Methods 0.000 description 12
- 108700028369 Alleles Proteins 0.000 description 11
- 230000029087 digestion Effects 0.000 description 11
- 239000003431 cross linking reagent Substances 0.000 description 10
- 210000001671 embryonic stem cell Anatomy 0.000 description 10
- 238000001114 immunoprecipitation Methods 0.000 description 10
- 210000004185 liver Anatomy 0.000 description 10
- 238000003752 polymerase chain reaction Methods 0.000 description 10
- 241000894007 species Species 0.000 description 10
- 108091008053 gene clusters Proteins 0.000 description 9
- 239000000463 material Substances 0.000 description 9
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 8
- 238000013412 genome amplification Methods 0.000 description 8
- 239000013641 positive control Substances 0.000 description 8
- 239000000243 solution Substances 0.000 description 8
- 238000009966 trimming Methods 0.000 description 8
- 102000009572 RNA Polymerase II Human genes 0.000 description 7
- 108010009460 RNA Polymerase II Proteins 0.000 description 7
- 238000011161 development Methods 0.000 description 7
- 230000018109 developmental process Effects 0.000 description 7
- 238000010790 dilution Methods 0.000 description 7
- 239000012895 dilution Substances 0.000 description 7
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 7
- 239000013642 negative control Substances 0.000 description 7
- 239000002953 phosphate buffered saline Substances 0.000 description 7
- 108091026890 Coding region Proteins 0.000 description 6
- 241000196324 Embryophyta Species 0.000 description 6
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 6
- 206010020751 Hypersensitivity Diseases 0.000 description 6
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 6
- ZYFVNVRFVHJEIU-UHFFFAOYSA-N PicoGreen Chemical compound CN(C)CCCN(CCCN(C)C)C1=CC(=CC2=[N+](C3=CC=CC=C3S2)C)C2=CC=CC=C2N1C1=CC=CC=C1 ZYFVNVRFVHJEIU-UHFFFAOYSA-N 0.000 description 6
- 108091027544 Subgenomic mRNA Proteins 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 6
- 239000000047 product Substances 0.000 description 6
- 108091008146 restriction endonucleases Proteins 0.000 description 6
- 238000007400 DNA extraction Methods 0.000 description 5
- 108700014808 Homeobox Protein Nkx-2.2 Proteins 0.000 description 5
- 101150013773 Hoxa7 gene Proteins 0.000 description 5
- 102100038380 Myogenic factor 5 Human genes 0.000 description 5
- 101710099061 Myogenic factor 5 Proteins 0.000 description 5
- 108091028043 Nucleic acid sequence Proteins 0.000 description 5
- 238000005119 centrifugation Methods 0.000 description 5
- 238000010382 chemical cross-linking Methods 0.000 description 5
- 230000002759 chromosomal effect Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 239000000499 gel Substances 0.000 description 5
- 238000011534 incubation Methods 0.000 description 5
- 238000003753 real-time PCR Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 4
- SHIBSTMRCDJXLN-UHFFFAOYSA-N Digoxigenin Natural products C1CC(C2C(C3(C)CCC(O)CC3CC2)CC2O)(O)C2(C)C1C1=CC(=O)OC1 SHIBSTMRCDJXLN-UHFFFAOYSA-N 0.000 description 4
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 4
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 4
- 241001529936 Murinae Species 0.000 description 4
- 108050002069 Olfactory receptors Proteins 0.000 description 4
- 229920004890 Triton X-100 Polymers 0.000 description 4
- 239000013504 Triton X-100 Substances 0.000 description 4
- 239000002253 acid Substances 0.000 description 4
- 238000000787 affinity precipitation Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 4
- QONQRTHLHBTMGP-UHFFFAOYSA-N digitoxigenin Natural products CC12CCC(C3(CCC(O)CC3CC3)C)C3C11OC1CC2C1=CC(=O)OC1 QONQRTHLHBTMGP-UHFFFAOYSA-N 0.000 description 4
- SHIBSTMRCDJXLN-KCZCNTNESA-N digoxigenin Chemical compound C1([C@@H]2[C@@]3([C@@](CC2)(O)[C@H]2[C@@H]([C@@]4(C)CC[C@H](O)C[C@H]4CC2)C[C@H]3O)C)=CC(=O)OC1 SHIBSTMRCDJXLN-KCZCNTNESA-N 0.000 description 4
- 210000002257 embryonic structure Anatomy 0.000 description 4
- 229940088598 enzyme Drugs 0.000 description 4
- 238000005194 fractionation Methods 0.000 description 4
- 238000007710 freezing Methods 0.000 description 4
- 230000008014 freezing Effects 0.000 description 4
- 238000007901 in situ hybridization Methods 0.000 description 4
- 238000010348 incorporation Methods 0.000 description 4
- 210000005229 liver cell Anatomy 0.000 description 4
- 238000000746 purification Methods 0.000 description 4
- 238000011002 quantification Methods 0.000 description 4
- 230000003252 repetitive effect Effects 0.000 description 4
- 239000011780 sodium chloride Substances 0.000 description 4
- 150000003431 steroids Chemical class 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- JKMHFZQWWAIEOD-UHFFFAOYSA-N 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid Chemical compound OCC[NH+]1CCN(CCS([O-])(=O)=O)CC1 JKMHFZQWWAIEOD-UHFFFAOYSA-N 0.000 description 3
- 208000027205 Congenital disease Diseases 0.000 description 3
- 108091029865 Exogenous DNA Proteins 0.000 description 3
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 3
- 239000004471 Glycine Substances 0.000 description 3
- 239000007995 HEPES buffer Substances 0.000 description 3
- 206010028980 Neoplasm Diseases 0.000 description 3
- 102000012547 Olfactory receptors Human genes 0.000 description 3
- 108091034117 Oligonucleotide Proteins 0.000 description 3
- 108700005081 Overlapping Genes Proteins 0.000 description 3
- 229940124158 Protease/peptidase inhibitor Drugs 0.000 description 3
- 101000713619 Xenopus laevis Tubulin gamma-1 chain Proteins 0.000 description 3
- 238000000246 agarose gel electrophoresis Methods 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 3
- 210000004958 brain cell Anatomy 0.000 description 3
- 101150041219 ercc3 gene Proteins 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 238000010353 genetic engineering Methods 0.000 description 3
- 210000005260 human cell Anatomy 0.000 description 3
- 238000009396 hybridization Methods 0.000 description 3
- 238000001727 in vivo Methods 0.000 description 3
- 239000012139 lysis buffer Substances 0.000 description 3
- 229910001629 magnesium chloride Inorganic materials 0.000 description 3
- 210000001161 mammalian embryo Anatomy 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000021121 meiosis Effects 0.000 description 3
- WSFSSNUMVMOOMR-NJFSPNSNSA-N methanone Chemical compound O=[14CH2] WSFSSNUMVMOOMR-NJFSPNSNSA-N 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 239000003068 molecular probe Substances 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 239000000137 peptide hydrolase inhibitor Substances 0.000 description 3
- 230000004481 post-translational protein modification Effects 0.000 description 3
- 238000003908 quality control method Methods 0.000 description 3
- 239000002096 quantum dot Substances 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 239000006228 supernatant Substances 0.000 description 3
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 2
- 229920000936 Agarose Polymers 0.000 description 2
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 208000014644 Brain disease Diseases 0.000 description 2
- 241000030939 Bubalus bubalis Species 0.000 description 2
- 102000004082 Calreticulin Human genes 0.000 description 2
- 108090000549 Calreticulin Proteins 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 2
- 208000031448 Genomic Instability Diseases 0.000 description 2
- 241000124008 Mammalia Species 0.000 description 2
- 241001599018 Melanogaster Species 0.000 description 2
- 108700026244 Open Reading Frames Proteins 0.000 description 2
- 238000012408 PCR amplification Methods 0.000 description 2
- 108020002230 Pancreatic Ribonuclease Proteins 0.000 description 2
- 102000005891 Pancreatic ribonuclease Human genes 0.000 description 2
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 2
- 150000007513 acids Chemical class 0.000 description 2
- 239000011543 agarose gel Substances 0.000 description 2
- 238000011166 aliquoting Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 239000011230 binding agent Substances 0.000 description 2
- 229960002685 biotin Drugs 0.000 description 2
- 235000020958 biotin Nutrition 0.000 description 2
- 239000011616 biotin Substances 0.000 description 2
- 210000005013 brain tissue Anatomy 0.000 description 2
- -1 by sonication Chemical class 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 230000022131 cell cycle Effects 0.000 description 2
- 238000004925 denaturation Methods 0.000 description 2
- 230000036425 denaturation Effects 0.000 description 2
- 229940009976 deoxycholate Drugs 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 208000035475 disorder Diseases 0.000 description 2
- 238000001976 enzyme digestion Methods 0.000 description 2
- 238000012869 ethanol precipitation Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 238000011065 in-situ storage Methods 0.000 description 2
- 238000012482 interaction analysis Methods 0.000 description 2
- 230000016507 interphase Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 210000002353 nuclear lamina Anatomy 0.000 description 2
- 230000003071 parasitic effect Effects 0.000 description 2
- 238000002205 phenol-chloroform extraction Methods 0.000 description 2
- YBYRMVIVWMBXKQ-UHFFFAOYSA-N phenylmethanesulfonyl fluoride Chemical compound FS(=O)(=O)CC1=CC=CC=C1 YBYRMVIVWMBXKQ-UHFFFAOYSA-N 0.000 description 2
- 101150093695 pitx3 gene Proteins 0.000 description 2
- BASFCYQUMIYNBI-UHFFFAOYSA-N platinum Chemical compound [Pt] BASFCYQUMIYNBI-UHFFFAOYSA-N 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 230000005855 radiation Effects 0.000 description 2
- 230000008707 rearrangement Effects 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000008961 swelling Effects 0.000 description 2
- 210000004881 tumor cell Anatomy 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000011179 visual inspection Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 238000004017 vitrification Methods 0.000 description 2
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 1
- OBULAGGRIVAQEG-DFGXMLLCSA-N 5-[(3as,4s,6ar)-2-oxo-1,3,3a,4,6,6a-hexahydrothieno[3,4-d]imidazol-4-yl]pentanoic acid;[[(2r,3s,4r,5r)-5-(2,4-dioxopyrimidin-1-yl)-3,4-dihydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21.O[C@@H]1[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O[C@H]1N1C(=O)NC(=O)C=C1 OBULAGGRIVAQEG-DFGXMLLCSA-N 0.000 description 1
- 108091023037 Aptamer Proteins 0.000 description 1
- 241000219194 Arabidopsis Species 0.000 description 1
- 108090001008 Avidin Proteins 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 1
- 238000006037 Brook Silaketone rearrangement reaction Methods 0.000 description 1
- 108010014064 CCCTC-Binding Factor Proteins 0.000 description 1
- 102000016897 CCCTC-Binding Factor Human genes 0.000 description 1
- 108091033409 CRISPR Proteins 0.000 description 1
- 206010065163 Clonal evolution Diseases 0.000 description 1
- 241001550206 Colla Species 0.000 description 1
- 102000029816 Collagenase Human genes 0.000 description 1
- 108060005980 Collagenase Proteins 0.000 description 1
- 239000004971 Cross linker Substances 0.000 description 1
- 102000012410 DNA Ligases Human genes 0.000 description 1
- 108010061982 DNA Ligases Proteins 0.000 description 1
- 230000007067 DNA methylation Effects 0.000 description 1
- 241000252212 Danio rerio Species 0.000 description 1
- 208000012239 Developmental disease Diseases 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- SXRSQZLOMIGNAQ-UHFFFAOYSA-N Glutaraldehyde Chemical compound O=CCCCC=O SXRSQZLOMIGNAQ-UHFFFAOYSA-N 0.000 description 1
- 229920002527 Glycogen Polymers 0.000 description 1
- 108020005004 Guide RNA Proteins 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 108010033040 Histones Proteins 0.000 description 1
- 208000026350 Inborn Genetic disease Diseases 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 241000577979 Peromyscus spicilegus Species 0.000 description 1
- 208000020584 Polyploidy Diseases 0.000 description 1
- 241000288906 Primates Species 0.000 description 1
- 108010029485 Protein Isoforms Proteins 0.000 description 1
- 102000001708 Protein Isoforms Human genes 0.000 description 1
- 238000012181 QIAquick gel extraction kit Methods 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 241000700159 Rattus Species 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 102000004389 Ribonucleoproteins Human genes 0.000 description 1
- 108010081734 Ribonucleoproteins Proteins 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 101000757182 Saccharomyces cerevisiae Glucoamylase S2 Proteins 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- VMHLLURERBWHNL-UHFFFAOYSA-M Sodium acetate Chemical compound [Na+].CC([O-])=O VMHLLURERBWHNL-UHFFFAOYSA-M 0.000 description 1
- 238000002105 Southern blotting Methods 0.000 description 1
- 108010090804 Streptavidin Proteins 0.000 description 1
- 108010006785 Taq Polymerase Proteins 0.000 description 1
- 108700019146 Transgenes Proteins 0.000 description 1
- 239000007984 Tris EDTA buffer Substances 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 108091000117 Tyrosine 3-Monooxygenase Proteins 0.000 description 1
- 102000048218 Tyrosine 3-monooxygenases Human genes 0.000 description 1
- 241000269368 Xenopus laevis Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000019552 anatomical structure morphogenesis Effects 0.000 description 1
- 210000004102 animal cell Anatomy 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000007413 biotinylation Methods 0.000 description 1
- 230000006287 biotinylation Effects 0.000 description 1
- LNQHREYHFRFJAU-UHFFFAOYSA-N bis(2,5-dioxopyrrolidin-1-yl) pentanedioate Chemical compound O=C1CCC(=O)N1OC(=O)CCCC(=O)ON1C(=O)CCC1=O LNQHREYHFRFJAU-UHFFFAOYSA-N 0.000 description 1
- 239000008366 buffered solution Substances 0.000 description 1
- 230000000981 bystander Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 230000005779 cell damage Effects 0.000 description 1
- 230000030833 cell death Effects 0.000 description 1
- 208000037887 cell injury Diseases 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- YTRQFSDWAXHJCC-UHFFFAOYSA-N chloroform;phenol Chemical compound ClC(Cl)Cl.OC1=CC=CC=C1 YTRQFSDWAXHJCC-UHFFFAOYSA-N 0.000 description 1
- 210000003763 chloroplast Anatomy 0.000 description 1
- 238000004587 chromatography analysis Methods 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 229960002424 collagenase Drugs 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000005056 compaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 239000013068 control sample Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000009646 cryomilling Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007850 degeneration Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000007865 diluting Methods 0.000 description 1
- 238000002224 dissection Methods 0.000 description 1
- 230000003291 dopaminomimetic effect Effects 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 230000003628 erosive effect Effects 0.000 description 1
- 210000000267 erythroid cell Anatomy 0.000 description 1
- ZMMJGEGLRURXTF-UHFFFAOYSA-N ethidium bromide Chemical compound [Br-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 ZMMJGEGLRURXTF-UHFFFAOYSA-N 0.000 description 1
- 229960005542 ethidium bromide Drugs 0.000 description 1
- DEFVIWRASFVYLL-UHFFFAOYSA-N ethylene glycol bis(2-aminoethyl)tetraacetic acid Chemical compound OC(=O)CN(CC(O)=O)CCOCCOCCN(CC(O)=O)CC(O)=O DEFVIWRASFVYLL-UHFFFAOYSA-N 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 239000012894 fetal calf serum Substances 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000001943 fluorescence-activated cell sorting Methods 0.000 description 1
- 230000002538 fungal effect Effects 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 208000016361 genetic disease Diseases 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 229940096919 glycogen Drugs 0.000 description 1
- 210000003917 human chromosome Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 210000004263 induced pluripotent stem cell Anatomy 0.000 description 1
- 238000013101 initial test Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000013067 intermediate product Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 210000005228 liver tissue Anatomy 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000007403 mPCR Methods 0.000 description 1
- 210000004962 mammalian cell Anatomy 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 230000031864 metaphase Effects 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 238000000386 microscopy Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 208000024191 minimally invasive lung adenocarcinoma Diseases 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000007857 nested PCR Methods 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 239000008188 pellet Substances 0.000 description 1
- 230000001935 permeabilising effect Effects 0.000 description 1
- 238000000053 physical method Methods 0.000 description 1
- 230000004962 physiological condition Effects 0.000 description 1
- 229910052697 platinum Inorganic materials 0.000 description 1
- 210000001778 pluripotent stem cell Anatomy 0.000 description 1
- 229920002401 polyacrylamide Polymers 0.000 description 1
- 230000008488 polyadenylation Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000031267 regulation of DNA replication Effects 0.000 description 1
- 230000014493 regulation of gene expression Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000008672 reprogramming Effects 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 235000004400 serine Nutrition 0.000 description 1
- 150000003355 serines Chemical class 0.000 description 1
- 239000011734 sodium Substances 0.000 description 1
- 239000001632 sodium acetate Substances 0.000 description 1
- 235000017281 sodium acetate Nutrition 0.000 description 1
- 210000004872 soft tissue Anatomy 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 210000000130 stem cell Anatomy 0.000 description 1
- 239000011550 stock solution Substances 0.000 description 1
- 210000001768 subcellular fraction Anatomy 0.000 description 1
- 210000003523 substantia nigra Anatomy 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- ABZLKHKQJHEPAX-UHFFFAOYSA-N tetramethylrhodamine Chemical compound C=12C=CC(N(C)C)=CC2=[O+]C2=CC(N(C)C)=CC=C2C=1C1=CC=CC=C1C([O-])=O ABZLKHKQJHEPAX-UHFFFAOYSA-N 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 210000001541 thymus gland Anatomy 0.000 description 1
- 238000004448 titration Methods 0.000 description 1
- 101150037438 tpm gene Proteins 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 230000009261 transgenic effect Effects 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
- 238000012070 whole genome sequencing analysis Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6816—Hybridisation assays characterised by the detection means
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- the present invention relates to the field of analysis of the three-dimensional structure of the genome, i.e., for genome architecture mapping on chromatin (GAM-ch).
- the invention provides a method of determining interaction of a plurality of nucleic acid loci in a compartment comprising nucleic acids, such as the cell nucleus, comprising separating nucleic acids from each other depending on their interaction in the compartment by crosslinking nucleic acids with each other directly or indirectly, fragmenting the nucleic acids of the compartment to obtain fragments and/or cross-linked complexes of fragments, and dividing the fragmented nucleic acids to obtain a collection of fractions such that every fraction contains, on average, less than one copy of every locus; determining the presence or absence of the plurality of loci in said fractions; and determining the co-segregation of said plurality of loci in the fractions.
- Co-segregation may then be analysed with statistical methods to determine interactions.
- the method can be used e.g., for identifying the frequency of interactions across a cell population between a plurality of loci; and mapping loci and/or genome architecture, e.g., in the nucleus, an organelle, a microorganism or a virus; identification of regulatory regions (enhancers) directing expression of a specific gene through spatial contacts; identifying the spatial contacts between loci that depend on their co- association with specific protein(s), or R A, and/or diagnosing a disease associated with a disturbed co-segregation of loci.
- Chromatin immunoprecipitation ChIP can be combined with the method of the invention.
- Information about the three-dimensional structure of chromatin is also of high interest, in particular, to discover contacts between regulatory regions (e.g. enhancers) and gene promoters which may be disrupted in disease due to genetic mutations in the non-coding part of the genome (e.g. Uslu V.V. et al. 2014 Long-range enhancers regulating Myc expression are required for normal facial morphogenesis. Nature Genetics 46: 753).
- regulatory regions e.g. enhancers
- gene promoters which may be disrupted in disease due to genetic mutations in the non-coding part of the genome
- chromosomes Studying the structural properties and spatial organization of chromosomes is important for the understanding and evaluation of the regulation of gene expression, DNA replication and repair, and recombination.
- the folding of chromosomes and their contacts has important implications for disease mechanisms and elucidation of targets for therapeutic approaches, e.g., in cancer or congenital diseases.
- Chromatin exists in interacting and non-interacting states. Interacting states have different properties depending on the characteristics of the genomic sites, or binding sites, involved in the interactions, namely (a) their number, distance and distribution, (b) their specificity and affinity for binders, and (c) the concentration and specificity of binders. Chromatin interactions can also involve different numbers of loci associating simultaneously (multiplicity of interaction).
- Fluorescence in situ hybridization uses microscopy to directly measure spatial distances between genomic loci, but it can only be applied to the study of a small number of genomic regions at a time in the same nucleus (e.g., Pombo A. 2003. Cellular genomics: which genes are transcribed when and where? Trends Biochem. Sci. 28, 6). It is theoretically possible to re-probe the same cells or tissue sections with different sets of probes, but there are concerns that repeated re-probing causes structural artefacts, e.g. due to DNA denaturation necessary to dissociate subsequent sets of probes, that e.g. induce artificial aggregation (contacts) of loci (i.e.
- RNA-FISH is a milder FISH approach that does not involve DNA denaturation but that can only be used to determine the nuclear position of actively transcribed genes (not silent genes). Samples from cells in the interphase stage of the cell cycle, where functional chromatin contacts are most often mapped, can be re-probed for R A-FISH only about three times, although the preservation of structure has not been measured in detail.
- the number of probes which can be simultaneously applied in either DNA- or RNA-FISH is limited by distinguishable fluorescent markers, e.g. 181 barcodes can in principle be obtained by combining five colours, four colour ratios and two different levels of intensity (Pombo A. 2003. Cellular genomics: which genes are transcribed when and where? Trends Biochem. Sci. 28, 6).
- this approach fails when the loci analysed are so close in space that the combination of fluorochromes in one probe is not distinguishable from the combination in another, and is therefore not amenable to the identification of loci that are spatially proximal at very short distances.
- FISH can only be applied to analyse interactions of known loci of interest, and not to discover e.g. the presence of an exogenous DNA sequence in an interaction with the host's DNA.
- the approach fails e.g. in the detection of endogenous or exogenous DNA sequences, unless they are known a priori, e.g. viral subtype integration positions and the exact sequences of exogenous DNA.
- FISH is also confounded by a priori assumptions of linear genome organisation, which are not acceptable to study chromatin positioning features, e.g. chromatin contacts, when e.g. the influence of natural variation in genomic sequence in organism populations is of interest, e.g. in studying human samples, due to the fact that FISH does not inherently detect sequence variations such as copy number variations, or genomic rearrangements, without a priori probe design or a priori whole genome sequencing of the sample followed by probe design.
- 3C-based methods generally start with chemical crosslinking of proteins that mediate genomic contacts. After chromatin extraction, pieces of DNA bound by the crosslinked proteins and RNAs are treated with a restriction enzyme for fragmentation. Addition of a ligase then connects (ligates) two pieces of DNA.
- 3C uses different methods of detecting such ligation events: a popular one is paired-end sequencing (Hi-C, 4C-seq, ChlA-PET), and in one embodiment the DNA bound by a specific protein (or molecule) is purified before the ligation step.
- the present inventors addressed the problem of providing an improved method for determining the interaction of nucleic acids, which avoids bias based on ligation of fragmented nucleic acids for detection of nucleic acids interactions, and which allows for simultaneous analysis of several high multiplicity interactions (each involving more than two loci), in particular, more than two interactions.
- the method allows for simultaneous analysis of substantially all nucleic acid interactions in the genome, in another, the method allows for simultaneous analyses of all nucleic acid interactions of fragments bound by a given protein or molecule of interest such as protein or RNA.
- This problem is solved by the method of the invention, as described below and in the claims. This method is designated Genome Architecture Mapping on Chromatin (GAM-ch).
- the present invention provides a method of determining interaction of a plurality of nucleic acid loci in a compartment comprising nucleic acids, comprising steps of
- nucleic acids from each other depending on their interaction in the compartment by (i) crosslinking nucleic acids with each other directly or indirectly, (ii) fragmenting the nucleic acids of the compartment to obtain fragments and/or cross-linked complexes of fragments, e.g. by the use of sonication, mechanical shearing or restriction enzyme digestion, and (iii) dividing the fragmented nucleic acids to obtain a collection of fractions such that every fraction contains, on average, less than one copy of every locus (e.g. about 0.5 copies or one copy in every other fraction), wherein steps (i) and (ii) can be carried out simultaneously or in any order;
- fragments bound by a given molecule of interest are selected, e.g. by chromatin immunoprecipitation (ChIP), as described in more detail below.
- ChIP chromatin immunoprecipitation
- a locus is the specific location of a gene, DNA sequence, or position on a chromosome (Wikipedia). Each chromosome carries many genes; the number of protein coding genes in the haploid human genome is estimated to be 20,000-25,000, on the 23 different chromosomes; there are as many transcription units which produce RNA species that do not encode for proteins. A variant of the similar DNA sequence located at a given locus is called an allele.
- the nucleic acid may be DNA or RNA or a combination of both, e.g., if interactions between genes being actively transcribed and other genomic regions are to be analysed. Usually, the method of the invention is used to analyse co-segregation of DNA.
- the co-segregation of loci may be analysed in any compartment comprising nucleic acids, such as the nucleus of a eukaryotic cell, a mitochondrion, a chloroplast, a prokaryotic cell or a virus.
- nucleic acids such as the nucleus of a eukaryotic cell, a mitochondrion, a chloroplast, a prokaryotic cell or a virus.
- co-segregation of nucleic acid in particular, DNA loci in the nucleus of a eukaryotic cell may be analysed.
- the method of the invention thus constitutes a solution to analyse locus proximity or interaction in the nucleus, through measuring their frequency of co- segregation in cross-linked DNA complexes extracted from nuclei.
- the cell or particle from which the compartment is derived may be a virus, a bacterium, a protozoan, a plant cell, a fungal cell or an animal cell, e.g., a mammalian cell, such as a cell from a patient (preferably, a human patient) having a disease or a disorder, or being diagnosed for a disorder, or a healthy subject.
- the cell may be a tumor cell or a stem cell, such as an induced pluripotent stem cell generated, e.g., through reprogramming of human tissues.
- Such cells can advantageously be used to apply GAM-ch to study human developmental disorders or congenital disease.
- the cell is an embryonic stem cell, it is preferably not generated in a method involving destruction of a human embryo. A plurality of cells/compartments or single cells may be analysed with the method of the invention.
- the mammal preferably is a human, but it may also be of interest to investigate, and, optionally, compare the genomic architecture of other organisms, such as E. coli, yeast, A. thaliana, C. elegans, X. leavis, D. rerio, D. melanogaster, mouse, rat or primate, or possibly parasitic interactions, e.g. the proximity of parasitic nucleic acids relative to the host genome, such as the chromatin contacts a virus (e.g. HIV, HSV) make with the host DNA, or of an artificially inserted nuclei acid (e.g. in the context of gene therapy).
- a virus e.g. HIV, HSV
- Cells can be derived from cell culture or analysed ex vivo from a specific tissue from a living organism or a dead organism, i.e., post-mortem, or from a whole experimental organism (e.g. a whole D. melanogaster embryo or C. elegans embryo), or from a mixture of microorganisms.
- Cells used in the analysis can be selected, e.g., by synchronizing the cells in a particular stage of the cell cycle, or sorting the cells e.g. by fluorescence activated cell sorting to capture a particular cell type expressing a specific marker, e.g., using an antibody specific for a protein uniquely expressed in the cell type or cell stage of interest, or detected by in situ hybridization e.g.
- a nucleic acid probe that detects a specific e.g. mRNA, or other RNA, expressed specifically in the cell type of interest, or a fluorescent marker such as GFP showing expression of a specific gene or characteristic of a specific stage.
- a GFP transgene under the control of the promoter of the Pitx3 transcription factor can be used to mark dopamine- expressing neurons (Maxwell S. et al, 2005, Pitx3 regulates tyrosine hydroxylase expression in the substantia nigra and identifies a subgroup of mesencephalic dopaminergic progenitor neurons during mouse development. Dev. Biol. 282 (2): 467-479).
- Cells can be pre-treated with an agent, e.g., to test the effect of drugs on co-segregation or positioning of loci, or be studied during the lifetime of an organism to understand development, ageing and degeneration.
- a suspension of single cells is prepared before step (a), depending on the species and type of tissue, e.g., a single cell suspension of mammalian solid tissues may be prepared.
- Preparation of a single cell suspension may be carried out by any procedure that is also compatible with 3C-techonologies. Detailed description of several single cell preparations compatible with the production of a chromatin sample that preserves crosslinked chromatin contacts can be found in e.g. Hagege H. et al. 2007. Quantitative analysis of chromosome conformation capture assays (3C-qPCR). Nature Protocols 2, 1722.
- the preparation of a single cell suspension may start by tissue dissection, followed by treatment with collagenase, or, for soft tissues (e.g. mouse thymus or fetal liver), by passage of tissue through a cell strainer (e.g. 40 micrometer mesh), or in the case of cells grown in in vitro culture or microorganism cultures, through centrifugation of the culture at appropriate force for the cell type, followed by resuspension at appropriate strength to yield a single cell suspension with minimal cell damage or death.
- a cell strainer e.g. 40 micrometer mesh
- centrifugation of the culture at appropriate force for the cell type followed by resuspension at appropriate strength to yield a single cell suspension with minimal cell damage or death.
- Application to post-mortem samples is also possible using published protocols or developments thereafter (Mitchell A.C. et al. 2014. The genome in three dimensions: a new frontier in human brain disease. Biol. Psychiatry 75, 961).
- the separation of nucleic acids from each other in step (a) is carried out by (i) crosslinking nucleic acids with each other directly or indirectly, i.e., DNA and/or RNA may be cross-linked directly or through proteins interaction with the nucleic acid, using e.g. chemical crosslinking agents such as formaldehyde, (ii) fragmenting the nucleic acids of the compartment to obtain a fragments and/or complexes of cross-linked fragments of nucleic acids, e.g.
- nucleic acids by sonication, and (iii) dividing the nucleic acids into fractions to obtain a collection of fractions each containing a plurality of fragments and/or complexes of cross-linked fragments, such that every fraction contains, on average, less than one copy of every locus.
- Nuclei, cells, tissues or whole organisms are treated with a crosslinking agent, e.g. a chemical crosslinking agent in step (a) (i).
- the crosslinking agent induces linkage of proteins with each other and between nucleic acids (DNA and/or RNA) and proteins.
- the method of the invention is compatible with cross-linking conditions that are also compatible with current 3C-based methods.
- the crosslinking agent comprises formaldehyde or another crosslinking agent compatible with DNA extraction.
- Formaldehyde will preferably be used, at a concentration of 0.5-4%, preferably, about l%-2% (all w/w), e.g., in a buffered solution, e.g., of PBS pH 7.0- 8.0, or directly by addition of concentrated solution of the cross-linking agent directly to cell medium, preferably for 5-120 min, preferably 10-20 min.
- Alternative cross-linkers are, e.g., disuccinimidylglutarate, dithiobis-succinimidyl propionate, glutaraldehyde.
- Crosslinking may also be performed by UV radiation.
- fixed nuclei or cells can be pelleted and stored frozen, e.g., at - 20°C, or -70°C or -80°C, e.g. in 1% formaldehyde.
- Steps (i) and (ii) may be carried out at the same time or in any order.
- crosslinking is performed as soon as possible to maintain the structure of chromatin intact as well as possible, i.e., it is usually performed first.
- Step (a) of the method may further comprise, e.g., permeabilisation of cells by a lysis buffer and/or freezing.
- the crosslinking can, e.g., be done directly on cells and then followed by permeabilisation, e.g., lysis with a suitable lysis buffer, and/or, freezing, and then fragmentation, e.g., by restriction (see Hagege et al. 2007).
- crosslinking and permeabilising can be performed at the same time.
- the fragmenting in step (a)(ii) can be carried out by any method, which preferably leads to formation of fragments of homogenous length, or randomly and evenly- spaced breaks in the nucleic acids.
- fragmentation can be done by ultrasound, by mechanical shearing, by Dounce homogenisation, vortexing with glass beads, or by restriction digest, or a combination of two or more of these methods.
- Physical methods such as ultrasound or shearing can be adapted to yield fragments or complexes of fragments of a desired fragment size, which may vary depending on the tissue and/or cell analysed.
- Preferred average fragment size depends on the resolution with which chromatin interactions are aimed to be mapped (which depend on organism and on aims) and is about 100bp-5 Mbp, or preferably, 200bp-500kbp or lkbp-5kbp nucleotides.
- the average "chromatin loop-size is about 100 kbp.
- Promoter contacts with regulatory regions are often local, below 50 kbp, so an appropriate resolution needs to be chosen.
- Dounce homogenisation can be performed using e.g. 100 mg tissue in (a) 2 mL IX PBS (phosphate buffered saline) or another suitable buffer, and (b) 200 ⁇ ⁇ protease inhibitor (Mitchell A.C. et al. 2014. The genome in three dimensions: a new frontier in human brain disease. Biol. Psychiatry 75, 961).
- vitrification i.e. rapid freezing
- chemical crosslinking agents e.g. formaldehyde
- restriction digestion may be considered to introduce some bias into the formation of fragments, it may be acceptable if it is taken into account in the analysis of results.
- frequently cutting restriction enzymes may be used, or a combination of enzymes recognizing different restriction sites e.g., two, three or four different restriction enzymes, may be used.
- a restriction digest with the enzymes Hindlll, Ncol, EcoRI or Bglll (6-base cutters) or DpnII or Nlalll (4-base cutters) may be carried out e.g. for 60 min, or over night at 37°C and will provide different fragment sizes depending on the genomic distribution of the restriction sequence.
- step (a) (iii) can be preceded by an additional step (a) (iii.0) comprising selection of fragments/complexes of fragments that are bound by a given molecule of interest, in particular a given protein, a given protein post-translational modification, a given RNA (if fragments are DNA) or a given DNA (if fragments are RNA), or a chemical modification of DNA (e.g. DNA methylation) or RNA, or a given protein/nucleic acid complex, or, after targeting a locus with Cas9 complex with guide RNAs.
- the given molecule of interest is a protein that is bound to chromatin at the time that chromatin forms contacts.
- Said selection may be carried out by an affinity-based method such as affinity precipitation, e.g.by performing a chromatin immunoprecipitation or pull down using antibodies or other affinity molecules (e.g. aptamer), followed by dividing/aliquoting e.g. the 'beads' used for pull down.
- affinity-based method such as affinity precipitation, e.g.by performing a chromatin immunoprecipitation or pull down using antibodies or other affinity molecules (e.g. aptamer), followed by dividing/aliquoting e.g. the 'beads' used for pull down.
- affinity precipitation with antibodies is preferred, other affinity based selection methods, e.g.
- biotin binding to avidin or derivatives such as streptavidin e.g., after labelling of chromatin using in vivo biotinylation, or incorporation of biotin to specific nucleic acid sequences, e.g. after in situ incorporation of Biotin-UTP or Biotin-dUTP into nascent RNA or nascent DNA, respectively, can also be employed.
- Specific nucleic acids may also be selected by use of hybridizing nucleic acids for selection, e.g., by affinity precipitation. Affinity precipitation can be substituted for by passage over columns comprising a ligand specific for the molecule of interest.
- Chromatin Immunoprecipitation can be employed (e.g., Collas, 2010. The current state of chromatin immunoprecipitation. Molecular Biotechnology 45(1):87-100; Stock et al. 2007; Brookes et al. 2012). Suitable conditions for specific interaction with the molecule of interest are employed, e.g., conditions for stringent hybridization. Methods disclosed in WO 2014/14152397 A2 may be employed.
- step (a) (iii) the nucleic acids in the preparation resulting from the previous steps, e.g., directly from step (a)(ii) or from step (a)(iii.0), are divided (or aliquoted) into fractions to obtain a collection of fractions such that every fraction contains, on average, less than one copy of every locus (e.g. 0.0001-0.9, 0.01-0.7, 0.1-0.6, 0.4-0.5, preferably, about 0.5 copies, i.e. one copy in every other fraction).
- one locus is seen in every other fraction (i.e. in 50% of the fractions), or in 40% or 30% or 10% or 5% of fractions.
- the number of fractions depends on the approximate number of loci and the genomic resolution at which the assay will be carried out (i.e. it depends on the total genome length of the organism under study and the length of the loci for which contacts are measured, in other words on the resolution).
- the nucleic acids are separated into many fractions.
- the number of fractions depends on whether only pairwise or multiple contacts are to be found between loci, on whether only the most highly frequent contacts (interactions) (e.g. frequency above 50% across the cell population), or also the least frequent contacts (e.g. 5%) also are to be identified.
- step (a) (iii.0) is used to reduce the complexity of the sample. If step (a) (iii.0) is used, analysis of about 180 fractions (or more) already provides meaningful results.
- the nucleic acid (often DNA) content of the fractions should be homogenous for the whole analysis, but non-homogenous fractions (e.g.
- fractions that have excessive DNA content may be excluded a posteriori once nucleic acid content is mapped; e.g., if using fractions that are supposed to contain approximately 30% of genomic DNA coverage on average, any tubes that contain more than 40% or less that 20% coverage can be excluded, or analysed separately, upon DNA detection.
- These fractions may be obtained from a plurality of cells (or nucleic acid containing cellular compartments) or from single cells.
- the separation into fractions is preferably done after homogenous division of the fragments and/or cross-linked complexes of fragments.
- some fractions will, statistically, contain one or more copies of all possible loci that cover the given genome. This may be found in different situations, firstly, when the preparation of fractions of the compartment leads to fractions with very heterogeneous content in terms of number of fragments (e.g. an large chunk of chromatin; Gavrilov A. A. et al. 2013. Disclosure of a structural milieu for the proximity ligation reveals the elusive nature of an active chromatin hub. Nucleic Acids Res. 41 , 3563-75). This is an artefact, which can be detected and disregarded in the analysis of the said invention. Furthermore, this may happen when the two alleles in a cell interact so closely that they appear in the same fraction. When loci are identified with sequencing, this is not a problem, as it can be measured based on sequence difference due to SNP variation between alleles.
- the presence or absence of the plurality of loci may be determined by e.g., polymerase-chain reaction (PCR), or preferably, by sequencing, preferably, by next generation sequencing and eventually by the developing single molecule sequencing techniques.
- PCR polymerase-chain reaction
- WGA single cell whole genome amplification
- the nucleic acids of loci in the fraction are sequenced substantially or completely. This is of particular interest if the method is carried out to detect possible interactions between different loci in a research setting, and a "normal" co-segregation pattern has not yet been established for the cell type of interest in the physiological conditions used.
- the method of the invention may thus be used to analyse spatial proximity (and, consequently, interactions) of unknown and/or unspecified loci, or of transgenic loci inserted in the genome (e.g. in gene therapy) to study their effects of chromatin contacts.
- the method can be used to detect specific (and new) species, as the DNA in cells of each species crosslinks with DNA from each species, and is more often found co-segregated.
- nucleic acids such as DNA may be analysed by crosslinking, nuclear fractionation (optional), fragmentation (i.e. chromatin preparation or preparation of nuclei acid complexes), dilution and separation into fractions or sub-samples, followed by amplification using single-cell whole genome amplification (WGA; Baslan, T. et al. 2012. Genome-wide copy number analysis of single cells. Nat. Protoc. 7: 1024) (Fig. 4A).
- WGA-amplified DNA may be sequenced, e.g., using Illumina HiSeq technology. Visual inspection of tracks from single fractions shows that each contains a different complement of sub-chromosomal regions of expected size (Fig. 6, Fig. 14), as expected from sequencing a sub-cellular fraction of chromatin containing fragment lengths of a given genomic length.
- each fraction contains only a restricted subset of sequences from each chromosome (Fig. 15B).
- presence or absence of a specific interaction has previously been investigated, so the interacting loci of interest are already known.
- a significant difference in the frequency with which two loci interact may have been found between different patient groups (e.g., healthy subjects and subjects having a disease, such as a tumor or a congenital disease).
- presence or absence of the two (or more) loci of interest can also be determined by specific PCR, or by otherwise specifically checking for their presence, e.g., by Southern blot or by Illumina HiSeq technology, after selection of nucleic acids covering locus of interest, e.g.
- GAM-ch thus preferably combines single copy locus fractionation of a crosslinked chromatin preparation with DNA detection (e.g. by whole genome amplification and next generation sequencing).
- DNA detection e.g. by whole genome amplification and next generation sequencing.
- chromatin is crosslinked, loci that are closer to each other in the nuclear space (but not necessarily on the linear genome) are found together in the single molecule fraction more frequently than distant loci (i.e. they co-segregate more frequently, Fig. 2).
- the frequency of contacts between genomic loci can then be inferred by scoring the presence or absence of loci among a number of aliquots containing a sub-genome sample of fragments (Fig. 2).
- the resulting table can be used to compute the co-segregation frequency of each locus against every other locus to create a matrix of inferred contact frequencies between loci. Therefore, GAM allows for the calculation of chromatin contacts genome wide without the need for end-to- end ligation between the interacting fragments.
- Co-segregation may be analysed with a statistical method to determine chromatin contacts. Close spatial proximity can be a sign for specific interaction of loci. Specific interaction of loci may thus also be determined by analysing co-segregation with a statistical method.
- Statistical methods used in the method of the invention may be, e.g., inferential statistic methods.
- Statistical methods used in the examples may also be used in the method of the invention to analyse samples of different origin and/or for different loci of interest, e.g., as mentioned herein.
- the loci are determined to interact specifically, when they co-segregate at a frequency higher than expected from their linear genomic distance on a chromosome. If all possible pairs of loci in the genome at a given genomic (linear) distance are considered, pairs of loci that do NOT interact will be found distributed around an average frequency of chromatin contacts (i.e., co- segregation across the collection of fractions) that depends on the genomic distance between the two loci and the degree of chromatin compaction.
- the term "contact" is used herein to describe co-segregation across the collection of fractions i.e., a quantitative measure of interaction. Loci that do not interact, e.g., are considered to have a value of contact of zero.
- interacting pairs will have higher frequencies of chromatin contacts (i.e., co-segregation in the fractions) than the average for that genomic distance that depends on their physical distance in the nucleus of that particular cell type.
- More complex arguments can also be considered, but an interaction can be most simply defined as a deviation from the random (three-dimensional) arrangement of the chromatin fibre taking into consideration any additional contributing factor(s) to a non- random behaviour.
- GAM-ch measures the frequency with which two loci co-segregate in the same fraction, and can measure the co-segregation of all genomic loci simultaneously, producing quantitative information that is amenable to (a) the identification of genomic coordinates that more frequently interact with other genomic regions, but also (b) to a wide-range of mathematical treatments that calculate the probability of loci interacting above some random (expected) behaviour.
- a plurality of loci means two or more loci, optionally, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least 12, at least 13, at least 15, at least 20, at least 30, at least 40, at least 50, at least 75, at least 100, at least 200, at least 500 or at least 1000 loci and up to several million or billion loci, which are analysed simultaneously. For example, allele-specific analysis of a human cell at 5 kb resolution requires simultaneous analysis of 1.3 million loci.
- substantially all loci or all loci in a compartment are analysed with the method of the invention, e.g., by sequencing substantially all nucleic acids, preferably, all DNA, in the compartment.
- the loci to be analysed may be determined in a biased way (e.g. by choosing to analyse all 23000 protein coding genes in a human cell, or all gene promoters or all non-coding regulatory regions, or all enhancers), or in an unbiased way, e.g. by dividing the genome into windows of a certain size, e.g., windows of 100 bp to 10 Mbp, preferably, 1 kbp to 1 Mbp, 5 kbp-50 kbp, or 10 kbp-30 kbp windows.
- the method of the invention can be applied in a way which does not distinguish between different alleles (e.g. the two homologous copies of a gene present in a normal human cell), or, alternatively, it can be used to distinguish the two (or more, in the case of e.g. polyploid amphibian cells) alleles of a locus in the same cell.
- different alleles e.g. the two homologous copies of a gene present in a normal human cell
- it can be used to distinguish the two (or more, in the case of e.g. polyploid amphibian cells) alleles of a locus in the same cell.
- the method of the invention allows for the detection of multiple co-segregating loci, in particu- lar, more than two co-segregating loci, preferably, more than three, more than four, more than 8, or more than 20, co-segregating loci.
- identification of multiple interactions using 3C-based methods has been attempted and shown to be both inefficient and highly biased (Sexton et al, 2012, Cell 148:458-72).
- There is mathematical evidence showing that these experimental limitations of 3C-based methods will remain insurmountable, irrespective of incremental improvements (O'SuUivan J.M. et al., 2013, Nucleus 4:390-8).
- restriction sites are not randomly distributed in the genome, leading to a bias in detection.
- the efficiency of ligation is affected by the different length of DNA fragments, which adds further bias to 3C-based results.
- the method of the invention is preferably not or not substantially affected by these biases.
- step (b) no ligation occurs between nucleic acids originally present in the compartment, in particular, no ligation has to be performed prior to step (b).
- ligation e.g., with external linkers is possible in the context of detection of the presence or absence of nucleic acid loci, e.g., for amplification or sequencing.
- the avoidance of ligation of nucleic acids derived from the compartment with each other overcomes the structural bias of 3C-based methods.
- GAM-ch is unique compared with competing technologies, as it can detect the multiplicity of loci interacting simultaneously, where there are more than three loci interacting at once (such detection being impossible or inefficient by ligation-based 3C-based methods), and it can also detect all loci present in the compartment and their copy number, irrespectively of whether they are found to participate in an interaction, which allows important corrections to be made in the contact maps. It is also one of the advantages of the method of the invention that it can be used to identify spatial proximity of loci which were not known before the method was carried out, i.e., interactions can be identified between newly discovered or non-defined loci. The present invention also provides the use of the method of the invention for
- the method of the invention may be used to determine specific interactions, and is capable of differentiating leading interactions from bystander interactions;
- mapping loci and/or genome architecture in the compartment (b) mapping loci and/or genome architecture in the compartment.
- a map in particular, a matrix, can be drawn up for specific loci or the chromosomal architecture based in the co- segregation frequencies determined;
- Chromosomal insertion of a nucleic acid due to gene therapy or other genetic engineering approaches may affect genome architecture, e.g., it may enhance or prevent interaction of regulatory regions with specific promoters and thus affect transcription of "unrelated" genes.
- the expression pattern of the introduced nucleic acid may itself depend on, or be disrupted by, its interactions with endogenous regulatory regions.
- the method of the invention allows for assessment of the effects of gene therapy or genetic engineering on the level of interaction between different loci;
- mapping chromosomal rearrangements e.g., in cancer, including in specific sub-tissue cell populations, e.g. to study clonal evolution of rearrangements;
- identifying a species in a mixture of species e.g., identifying a potentially novel microorganism species in a mixture of species
- the method of the invention may be used in identification of species in microbial communities, e.g. as described for Hi-C in Burton et al. (2014, G3 4, 1339-1346).
- step (a)(ii) specifically mapping contacts mediated by a defined factor (or molecule of interest, e.g., protein, R A, DNA and/or their modifications), e.g., by extracting said factor and associated complexes of fragments after step (a)(ii) is carried out, e.g., by immunoprecipitation of the defined protein and associated complexes of fragments (step (a) (iii.O)).
- a defined factor or molecule of interest, e.g., protein, R A, DNA and/or their modifications
- Option (1) may be of specific interest, as it reduces the complexity of the sample.
- the present invention thus also provides a method of diagnosing a disease associated with a disturbed co-segregation of loci in a patient, comprising, in a sample taken from said patient, analysing co-segregation of a plurality of loci in the patient, and comparing said co-segregation with co-segregation of said loci in a subject already diagnosed with said disease, wherein the co- segregation is preferably also compared with co-segregation in a healthy subject.
- co-segregation of loci may be compared between specific sub-groups of cells, which may be derived from the same patient, e.g., tumor cells and normal tissue.
- Co-segregation can also be analysed in different cell types upon derivation of pluripotent stem cells from the patient, or model organism, and their experimental differentiation into specific cell types through laboratory culture in appropriate conditions, e.g. in the presence of the appropriate factors, in the suited container, at the appropriate temperature, e.g. 37°C for human samples.
- "a" is meant to refer to "at least one", if not specifically mentioned otherwise.
- the present invention may be used to investigate a disturbed co-segregation of loci in a patient, i.e., chromatin misfolding, it may also inform or guide the treatment of patients having a disease associated with chromatin misfolding, as such patients may, after diagnosis with a method of the invention, be treated to correct chromatin misfolding (Deng W., Blobel G., 2014, Curr Op Genet Dev. 25: 1-7). The present invention may then be used to monitor the effects of such treatments on chromatin misfolding.
- Fig. 1 Limitations of current 3C-based methods due to dependency on ligation of DNA ends for capturing contacts between nucleic acids.
- 3C-based methods the presence of multiple loci in a single interaction may dilute the measured ligation frequency between any two loci that are member of the interaction.
- GAM-ch the measured interaction is not affected by multiplicity.
- Fig. 2 Outline of the GAM-ch method. Chromatin is prepared from mildly-fixed cells and randomly fragmented (1). Crosslinked chromatin is divided (ali quoted) across tubes to have ⁇ 1 haploid genome equivalent per tube (2). The DNA content of each tube is determined to assess the co-segregation of genomic sequences across tubes (3). Co-segregation of genomic sequences reflects chromatin contacts of genomic sequences in the cell nucleus dependent on protein- protein and protein-RNA bridged interactions and is used to measure long-range chromatin interactions.
- A Schematic presentation of the mouse ⁇ -globin gene cluster (adapted from Tolhuis, B. et al. (2002). Looping and interaction between hypersensitive sites in the active beta-globin locus.Molecular Cell 10, 1453). Arrows and circles depict the individual hypersensitive sites.
- the ⁇ -globin genes are indicated by triangles, with active genes (Pmaj and ⁇ ) in grey and inactive genes ( ⁇ and ⁇ ) in black.
- the olfactory receptor (OR) genes are indicated by white boxes, of which some were shown to interact with the ⁇ -globin gene cluster. Grey boxes also indicate other gene loci (3' prime olfactory receptor genes, Uros and Eraf), which were shown to interact with the ⁇ -globin gene cluster in embryonic liver tissue.
- LCR Locus Control Region.
- B A hypothetical 3D model of the active chromatin hub (ACH) based on population-based 3C data from Tolhuis et al. (2002). Neither the size of the ACH nor the actual position of the elements relative to each other is to scale. Hypersensitive sites and active genes of the locus form a hub of hyper-accessible chromatin (ACH). The inactive regions of the locus, having a more compact chromatin structure, are indicated in grey, with the inactive ⁇ and ⁇ genes in lighter grey. The olfactory genes are not shown. The interactions in the ACH would be dynamic in nature, in particular with the active genes (Pmaj and ⁇ ), which are alternately transcribed.
- Crosslinking frequency with value 1 arbitrarily corresponds to the crosslinking frequency between two neighbouring control fragments within the Calreticulin (CALR) gene locus, which is expressed at similar levels in the two tissues.
- a schematic illustration of mouse ⁇ -globin gene cluster is depicted; the grey shading represents the position and size of fragments generated by Hindlll restriction.
- the quality of the chromatin preparation produce was validated by 3C at four regions of the murine ⁇ -globin gene cluster in fetal liver and brain cells. Fetal liver and brain cells from El 4.5 mouse embryos were fixed (5 or 10 min) in 2% formaldehyde, digested with Hindlll and ligated under highly diluted conditions. Ligation products were quantified by qPCR using the 3'end hypersensitive site (3'HS1) as bait. Means and SEM are shown. The black vertical line indicates the position and size of the 3C-bait fragment containing 3'HS1.
- Crosslinking frequency with a value of 1 arbitrarily corresponds to the crosslinking frequency between two neighbouring control fragments (with analyzed restriction sites being 8.3 kb) within the Ercc3 gene locus (on chromosome 18), which is expressed at similar levels in fetal liver and brain. Black bars indicate the position of primer pairs used for 3C.
- PK proteinase K
- GAM-ch samples marked with an asterisk were used for library preparation.
- WGA-amplified DNA was fragmented to -400 bp using Covaris, and amplified using the Illumina library mate-pair kit DNA fragments were excised (350-650 bp for -0.2 and -10 genomes, 200-650 bp for -0.7 genomes), quantified and sequenced.
- GAM-ch is also designated xGAM.
- Fig. 6 Mapping of GAM-ch-seq datasets corresponding to -0.2 and -10 genomes in comparison with linear DNA.
- Gaps are defined as regions which are not covered by reads.
- the sequencing depth is calculated by dividing the genome into identical windows and counting the number of nucleotides covered by reads, which fall into each window.
- Fig. 8 Gap-size (A) and sequencing depth (B) distributions for 10 ng of linear DNA and GAM-ch (xGAM) samples at -0.2 and -10 genomes.
- X axes represent the gap-sizes and sequencing depth at 1 kb windows (bp) in log 10 scale.
- Y axes represent Kernel probability densities.
- Graphs are plotted using density function in R. Fig. 9. Thresholds from Gaussian fitting to GAM-ch fractions with ⁇ 0.2 genomes.
- the threshold is defined as the number of reads for which the height of the Gaussian fit ( ⁇ in dotted thick line) equals the height of the entire sequencing depth distribution (Ay in thin grey line).
- X-axes represent the sequencing depth at 1 kb windows in the loglO scale.
- Y-axes represent the Kernel probability densities.
- Fig. 10 Number of "positive windows” detected from random sampling the original datasets of -0.2 genomes (10 to 100%, 12 pM).
- Erosion of reads from GAM-ch-0.2 genome dataset shows only a mild change of detected "positive windows" when randomly sampling -60% of reads. Information is markedly lost when ⁇ 30% of reads are considered.
- the threshold used here for the detection of 4 kb windows is based on the residual analysis in Fig. 9.
- Fig. 11 Outline of the GAM-ch method in combination with immunoprecipitation of chromatin bound by, e.g., RNA polymerase II.
- Chromatin crosslinking and fragmentation e.g., chromatin is prepared from mildly fixed cells and randomly fragmented, e.g. by sonication (1).
- fragment enrichment e.g., by immunoprecipitation of a specific chromatin-bound protein such as RNA polymerase II (2).
- Division of the fragmented nucleic acids to obtain a collection of fractions (every fraction contains ⁇ 1 copy of every locus, typically ⁇ 0.5 copies).
- crosslinked chromatin is either directly divided (aliquoted) across tubes to have ⁇ 1 haploid genome equivalent per tube (3 a), or (optionally) first enriched for chromatin occupied by a given protein (or other bound molecule of interest), e.g.
- chromatin immunoprecipitation 3b. Extract and detect nucleic acids, e.g., the DNA content of each tube is extracted and identified to assess the co-segregation of genomic sequences across tubes (4). Co-segregation of genomic sequences reflects chromatin contacts of genomic sequences in the cell nucleus dependent on protein-protein and protein-RNA bridged interactions and is used to measure long-range chromatin interactions. Boxes: Enhancers, thick black line: active gene, medium thick line: inactive gene, arrows: promoters.
- RNA polymerase II occupies active gene promoters, coding regions and enhancers.
- RNAPII-S5p ChlP-seq signal at promoters also called transcription start sites (TSS)
- TSS transcription start sites
- TES transcription end/termination sites
- Transcriptionally silent genes are not occupied by RNAPII-S5p.
- the average occupancy profiles are represented at ⁇ 5kb windows centered at the transcription start site (TSS) or transcription end site (TES). All mouse genes were ranked by their expression levels determined by mRNA-seq in mouse ESCs (Brookes et al. 2012), then top 25% genes were selected as most actively transcribed genes and the bottom 25% genes were selected as most transcriptionally silent.
- RNA polymerase II co-associates with enhancers.
- RNAPII-S5p is present at enhancers defined in murine ESCs according to Whyte et al. (2013). Background levels of ChIP signal was determined by a control ChIP experiment using non-specific antibody against plant steroid digoxigenin.
- RNAPII-S5p occupancy determined by ChIP combined with quantitative PCR at active, Polycomb-repressed and inactive genes. Quantitative PCR confirms the expected enrichment of RNAPII-S5p of active (Oct4) and Polycomb-repressed (Nkx2.2, HoxA7) genes, and its absence at inactive (Myf5) gene, as expected (Stock et al. 2007). Background levels (mean enrichment after ChIP with non-specific antibody against plant steroid digoxigenin) at promoter and coding regions are shown in black bars. Means and standard deviations from three biological replicates are shown.
- ChlP-enriched positive windows for different starting amounts of chromatin immunoprecipitated DNA.
- the percentage of positive windows for GAM-chIP dataset is higher for GAM-chIP samples with larger amounts of input DNA.
- ChlP-enriched positive windows were determined by number of reads in each 5 kb window from published ChlP-seq RNAPII- S5p obtained in mESC (Brookes et al. 2012). The top 2% of 5 kb windows were taken as the genomic windows most enriched for RNAPII-S5p.
- Fig. 14 GAM-chIP raw data and detection of positive genomic windows.
- GAM-chIP profiles of raw sequencing data across two genes show that more positive windows are detected across an actively transcribed gene than an inactive gene. Represented tracks from top to bottom: 1 - RNAPII-S5p ChlP-seq in mESC; 2 - cumulative window detection frequency across 182 GAM-chIP datasets; 3-7 - raw sequencing data for five randomly chosen GAM-chIP datasets together with representation of positive windows defined by fitting binominal distributions (black horizontal bars) or by JAMM peak-finder approach (striped horizontal bars); 8 - raw sequencing data for a control sample containing no chromatin immunoprecipitated material (water control). Images were obtained from UCSC Genome Browser using mean as windowing function. Schematic representation of the genes present in the selected regions is shown underneath.
- Fig. 15 Quality controls of GAM-chIP dataset.
- Each GAM-chIP sample contains only a restricted subset of sequences from each chromosome. Each mouse chromosome was divided into 5 kb windows, and the percentage of positive 5kb windows was plotted for each chromosome and for each GAM-chIP sample. No GAM-chIP sample contains more than 12% of any given chromosome, and all chromosomes are comparable in coverage except for chromosome X, which is present in only a single copy (whereas autosomal chromosomes are present in two copies), as expected in the male ESC line used.
- RNAPII-S5p occupancy in ChlP-seq datasets (from published data; Brookes et al. 2012).
- the TSS-overlapping 5 kb windows with the least binding of RNAPII-S5p are detected in 4.4% of GAM-chIP samples on average, whereas those with the most abundant binding are detected in an average of 12.5% of GAM-chIP samples.
- the percentage of 5 kb positive windows overlapping transcriptionally active genes and enhancers are higher than the percentage of 5 kb positive windows overlapping transcriptionally silent genes.
- the percentage of positive windows is shown for gene body (gene), promoters (transcription start site, TSS) and transcription end site (TES).
- the set of most actively transcribed and of most silent genes were chosen based on their expression levels, as determined by mR A-seq in a published dataset (Brookes et al. 2012). Positive 5 kb windows overlap gene promoters with high R APII-S5p levels (as determined from published ChlP-seq dataset; Brookes et al. 2012) more often than gene promoters with low R APII-S5p levels.
- Fig. 16 Co-segregation of genomic windows within actively transcribed genes in GAM- chIP samples. GAM-chIP samples containing multiple positive windows from the same actively transcribed gene occur more frequently than GAM-chIP samples containing multiple positive windows from the same silent genes and more often than would be expected by chance, confirming that chromatin contacts can be formed within actively transcribed genes during transcription (as schematized in Fig. 11).
- Fig. 17 Co-segregation of genie regions of actively transcribed genes coincides with preferential co-segregation of nearby enhancers in GAM-chIP samples.
- active (but not for silent) genes the nearest enhancer was more frequently observed in the GAM-chIP samples with the highest number of positively detected intragenic windows.
- co-segregation of nearby enhancers in the same GAM-chIP samples as actively transcribed genes is indicative of a chromatin interaction between the enhancer and gene during transcription.
- HAPPY Mapping is based on the co-segregation and detection of nearby DNA markers in the genome and uses limiting dilutions of fragmented DNA to single molecule contents.
- LOD logarithm of the odds
- GAM-ch applies the basic principle of HAPPY Mapping to a different purpose: instead of measuring linear genomic distances, it measures long-range chromatin interactions between any genomic regions within the three-dimensional cell.
- Cells are first treated with a crosslinking agent which, for example, chemically crosslinks proximal genomic regions in the same or differ- rent chromosomes, before chromatin fractionation.
- a crosslinking agent which, for example, chemically crosslinks proximal genomic regions in the same or differ- rent chromosomes, before chromatin fractionation.
- GAM-ch detects chromatin proximity but does not require ligation of crosslinked DNA fragments.
- GAM-ch chromatin preparations similar to 3C are prepared and diluted as for HAPPY Mapping, before quantification of co-segregation frequency; genomic regions that are bridged by proteins and crosslinked during the chromatin preparation will co-segregate more frequently than genomic regions that do not interact (Fig. 2).
- GAM-ch can provide single allele information about multiplicity of interactions, i.e. multiple genomic regions interacting at the same time with a given allele.
- 3C a given DNA fragment in a high multiplicity chromatin interaction can only ligate with one or two (at high restriction and ligation efficiency) other DNA fragments.
- This limitation of 3C makes it difficult to distinguish, for example, between a low-frequency chromatin interaction involving only two fragments and an interaction that involves many genomic partners at high frequency across the cell population (Fig. 1).
- the same 3C signal e.g.
- a measured contact of 50% can be due to an interaction that occurs for half the alleles in the cell population if the multiplicity of interaction is only two (or possibly three), or be due to an interaction that occurs in all alleles (real contact frequency is 100%) but is underestimated to only 50%> due to competition with other bound DNA fragments that co-bind at high multiplicity, thereby diluting the probability of ligation between any single fragment with all others.
- each GAM fraction was subjected first to WGA fragmentation, primer ligation and PCR amplification. WGA-amplified GAM-ch samples were then further amplified using the Illumina library preparation, which adds new sets of primers at each end of the DNA fragments. GAM-ch- seq samples were sequenced using the Illumina sequencing platform (Table 1). As recent 3C- based genome-wide mapping approaches use Hindlll digestion, instead of sonication, this approach was also adopted here. Validation of Hindlll-digested chromatin preparations was performed by 3C analyses (Fig. 3D). Linear DNA was used in parallel to test the effects of WGA and high-throughput sequencing on sequence representation, and as a positive control.
- GAM-ch samples were prepared for Illumina sequencing as described for 3C and validated by 3C-qPCR using published primer sequences (Fig. 3D). Nuclei from fetal liver cells, fixed for 5 min, were extracted, counted using a haemocytometer, subjected to digestion with Hindlll (digestion efficiency of -77%), and aliquots of -100 genomes ⁇ L were prepared and frozen for further use. Different genome numbers of 3C-like chromatin were first subjected to WGA fragmentation (1 h at 50°C with PK and 4 min at 99°C) and amplification (-0.2, -0.7 and -10 genomes/tube; Fig. 4B). Linear human DNA (2 ng; provided with the WGA kit) was used as a positive control for the WGA reaction.
- Fragment sizes of crosslinked chromatin range from -0.3-2 kb, whereas linear DNA is less fragmented upon WGA, probably due to lower-sized DNA fragments present in Hindlll digested chromatin (average distance between Hindlll restriction sites is -4 kb in the mouse genome).
- GAM-ch samples of -0.2 genomes did not show visible products on ethidium bromide gels after WGA amplification (Fig. 4B), but yielded visible products upon preparation of sequencing library (Fig. 6).
- GAM-ch samples were subjected to Illumina library preparation and DNA fragments were size- selected (350-650 bp for -0.2 and -10 genomes, 200-650 bp for -0.7 genomes) and sequenced. Since the -0.7 genome GAM-ch sample showed less-intense WGA products, DNA fragments from a wider range size were excised and sequenced. Linear mouse DNA was also amplified by WGA and Illumina library kits (not shown) and sequenced in parallel.
- Each unmappable read is trimmed at its 5 'end by 36 nts and mapped back to the genome. For the remaining reads that still do not align, then 36 nts are trimmed at the 3 'end of the read, and resulting 36 nt read realigned to the genome.
- This trimming strategy increased the overall percentage of alignment to ⁇ 54 ⁇ 6% (Fig. 5B). This trimming pipeline is not necessary for libraries produced using Illumina Nextera library kits, as the library production relies on tagmentation.
- GAM-ch libraries obtained from -0.2 genomes show a more clustered distribution of sequencing reads with higher enrichment, as expected due to lower genomic content. This is consistent with a lower diversity of DNA fragments in the -0.2 genome libraries. The higher enrichment suggests that the amount of sequence obtained may already be sufficient to over- represent this diversity.
- the first step in the analysis of GAM-ch samples is to detect DNA fragments that are present or absent in each GAM-ch sample analysed with subgenomic content. This requires the definition of background read distribution, and a decision about an appropriate window size.
- the window size should reflect the average size of the DNA fragments present in 3C- like chromatin. For Hindlll restriction, this corresponds to -4 kb fragments.
- Two different statistical approaches were performed to analyse and to compare sequencing results from multiple libraries. First, the distribution of the gap-size between adjacent covered areas of the genome was analysed, and second the sequencing depth at different window sizes was studied (Fig. 7). Both approaches were used to analyse the sequencing results from linear DNA and GAM-ch samples (Fig. 8).
- the content of GAM-ch samples with -10 genomes also show an even distribution across the genome meaning that the whole genome is covered, which suggests that DNA extraction from 3C-like chromatin is efficient.
- the average gap size peaks at -1 kb (Fig. 8A) and displays a second population of gap-sizes of -100 bp. This may reflect the fact that not all genomic regions are represented in this low DNA content sample; it can be the result of interacting DNA sequences within short range distances (as seen in 4C results) being frequently brought together due to crosslinking; further sequencing experiments and analyses are currently ongoing to investigate the significance of the different gap distributions.
- sequenced reads in the -10 genomes sample are sequenced multiple times, such that each 1 kb window is covered by more reads with the distribution of sequencing depths peaking at -500 nts per 1 kb window (Fig. 8B). Since sequences are a mix of 36 and 72 nt reads, which will appear as single spikes representing a multiple unit of 36 nts in the sequencing depth distribution, each average read would contain about 50 nts ((36+72)/2). Therefore, each 1 kb window of the -10 genomes GAM-ch sample would be covered by -10 reads. In addition many windows with ⁇ 10 reads exist, which are visualized by the left spiky tail in the sequencing depth curve and are hardly distinguishable from the main population of windows with 10 reads.
- GAM-ch samples with -0.2 genomes contain only a fraction of the genome, as seen in wider gaps in the read distribution and a gap-size peaking at -50 kb, with additional shoulders reflecting non-random spacing between DNA fragments; this is consistent with the presence of chromatin interactions in these few GAM-ch samples.
- the less diverse set of fragments that are sequenced in the -0.2 genomes sample are sequenced more frequently than fragments in GAM- ch- 10 genomes sample, resulting in about 5000 sequenced nucleotides in each 1 kb window corresponding to -100 reads per 1 kb window.
- the GAM-ch sample with -10 genomes did not have enough sequencing depth to sufficiently resolve the signal from the noise distribution.
- the threshold is 790 nts (-1 1 reads with 72 nts), in the same order of magnitude of the residual distribution approach.
- GAM-chIP combining GAM-ch with immunoprecipitation to capture fragments co-occupied by RNA polymerase II phosphorylated on Serines.
- the DNA fragments bound by a specific protein are selected from the bulk chromatin, e.g. by chromatin immunoprecipitation (ChIP), a strategy called GAM-chIP.
- GAM-chIP is performed with an additional step in which crosslinked chromatin fragments, containing a given protein or protein post-translational modification, are first selected prior to their dilution between tubes, e.g. to enrich for fragments containing genes and regulatory regions (enhancers) (Fig. 11). Including this additional selection step has two advantages: first it allows for detection of chromatin contacts which are formed in the presence of the given protein or protein post-translational modification.
- R APII-S5p DNA fragments bound by R A polymerase II phosphorylated on the Serine-5 residue of the CTD, which we abbreviate to R APII-S5p.
- R APII-S5p was chosen because it has high occupancy at active genes, especially at promoters, throughout coding regions and transcription termination sites, and enhancers (Fig. 12A,B). Combining GAM-ch with ChIP for R APII-S5p therefore has the potential of increasing the power of GAM-ch to detect contacts between enhancers and their target genes.
- chromatin was crosslinked using formaldehyde and fragmented by sonication, then chromatin fragments bound by R APII-S5p were selected by immunoprecipitation using a specific antibody coupled to beads (CTD-4H8, Covance; according to Brookes et al 2012).
- CCD-4H8, Covance a specific antibody coupled to beads
- fragments resulting from ChIP were eluted from beads, and fractionated/diluted into a multitude of fractions and WGA amplified.
- RNAPII-S5p bound DNA fragments ChIP of RNAPII-S5p bound DNA fragments was performed as described previously (Stock et al. 2007; Brookes et al. 2012).
- Mouse embryonic stem cells (ESCs) were fixed in 1% formaldehyde for 10 min. Nuclei were then extracted, counted using a haemocytometer, and chromatin was extracted using sonication. Sonicated chromatin fragments bound by RNAPII-S5p were selected by immunoprecipitation.
- RNAPII-S5p was validated using quantitative PCR of DNA fragments known to be bound by RNAPII-S5p in mouse ESCs, namely promoters and coding regions of active and Polycomb-repressed genes (Fig. 12C); inactive gene promoter and coding region were used as negative control.
- a control ChIP experiment was performed with nonspecific antibody against plant steroid digoxigenin, which showed no DNA fragment enrichment, as expected (Stock et al. 2007; Brookes et al. 2012). This analysis demonstrated that the antibody immunoprecipitation step had successfully and efficiently selected RNAPII-S5p-bound chromatin fragments (Fig. 12C).
- the immunoprecipitated chromatin material was divided (aliquoted) into multiple tubes at the chosen dilution factor based on the measured DNA concentration.
- GAM-chIP samples show a fragment size distribution of ⁇ 100bp to ⁇ 1200bp following WGA amplification (Fig. 13 A, slightly smaller than for GAM-ch samples prepared by Hindlll digestion without chromatin immunoprecipitation; Fig 4B).
- the fragment size distributions and the amount of DNA after amplification were comparable between different samples prepared from the same concentration of input DNA (Fig. 13B).
- GAM-chIP samples from the first two exploratory experiments were subjected to Illumina TruSeq Nano library preparation (Table 2).
- GAM-chIP An exploratory GAM-chIP dataset was collected consisting of 182 GAM-chIP samples (Table 2. GAM-chIP Exp003), each generated from 1 pg of chromatin after ChIP for RNAPII-S5p, plus four positive controls containing 500 pg of the same chromatin, and four negative controls where no chromatin was added (water control).
- GAM-chIP samples in this exploratory collection were WGA amplified and subjected to Illumina Nextera XT library preparation. DNA fragments from 300-500 bp were size-selected and sequenced.
- the mouse genome was divided into 5 kb windows and the number of sequencing reads mapping to each window was calculated.
- a two- curve fitting strategy was applied to distinguish signal from noise in GAM-chIP datasets.
- the distribution of sequencing depth over 5 kb windows was fit with a negative binomial distribution (representing sequencing noise) and a lognormal distribution (representing true signal).
- a threshold number of reads x was determined, where the probability of observing more than x "noise" reads mapping to a single genomic window was less than 0.001. Such a threshold was thus independently determined for each sample, and windows were scored as positive if the number of sequenced reads was greater than the determined threshold.
- GAM-ch and GAM-chIP experiments have the greatest statistical power when the chance of a given tube containing a given locus of interest is ⁇ 0.5.
- the loci of interest are those which are bound by the protein targeted for enrichment, which can be identified by sequencing the bulk immunoprecipitated chromatin (ChlP-seq) without dilution and WGA amplification.
- RNAPII-S5p As an estimation of the complexity of the datasets produced in the second experiment (Exp002), we determined the number of sequencing reads mapping to each 5 kb window by ChlP-seq of RNAPII-S5p using a published ChlP-seq dataset obtained in mouse ESCs (Brookes et al. 2012). The top 2% of 5 kb windows were taken as the genomic windows "most enriched for RNAPII- S5p". The percentage of "RNAPII-S5p most enriched windows" identified as positive in each GAM-chIP sample was determined (Fig. 13C). The percentage of most enriched windows identified as positive in each GAM-chIP dataset was highest for GAM-chIP samples with larger amount of input DNA, but was 2-16%, i.e.
- the exploratory GAM-chIP R APII-S5p dataset consisted of 182 samples containing lpg of ChIP DNA, four samples with 500 pg DNA (positive controls) and four samples without DNA (negative controls). Positive windows were identified for each of these 190 samples as outlined above for the other GAM-chIP datasets. Positive windows were examined in the UCSC Genome Browser and compared to the raw sequencing data, confirming that the window-calling approach was performing sensibly (Fig. 14). We confirmed that each GAM-chIP sample contained only a subset of 5 kb windows, whilst very few positive windows were identified for the negative control samples, in support of the feasibility of the approach.
- the 182 GAM-chIP samples were collected in two batches, each of which was further divided into four pools for independent sequencing to achieve sufficient sequencing depth.
- the first four batches were WGA amplified immediately after ChIP, the second four batches were WGA amplified from the same ChIP material following storage at -20°C after the aliquoting step but before WGA amplification.
- This collection of GAM-chIP samples gave a total of eight pools, each containing around 24 GAM-chIP samples.
- quality control of purity of the amplified material from very small amounts of mouse DNA fragments, i.e. lpg
- the percentage of sequencing reads from each library that could be successfully mapped back to the mouse genome was plotted by library pool number (Fig. 15 A).
- the negative control samples yielded very low percentages of mapped reads to the mouse genome, indicating that they were not contaminated by mouse DNA (e.g. from the GAM- chIP samples processed in parallel) during the WGA amplification or library preparation steps.
- Positive control samples (each with 500 pg of DNA) yielded the highest percentage of mapped reads (85% on average), whilst 178 out of 182 GAM-chIP libraries showed robust read mapping rates to the mouse genome of >70%.
- the distribution of the percentage of mapped reads was highly reproducible between samples and between sequencing pools. In particular, pools 5 to 8 did not yield a smaller percentage of mapped reads than pools 1 to 4, indicating that they were not affected by the addition of the freezing step (Fig. 15 A).
- each GAM- chIP sample contains only a restricted subset of sequences from each chromosome (Fig. 15B).
- No GAM-chIP sample contains more than 12% of any given chromosome, and all chromosomes are comparable in coverage except for chromosome X, which is present in only a single copy (whereas autosomal chromosomes are present in two copies), as expected in the male ESC line used.
- RNAPII-S5p antibodies shows abundant detection of DNA fragments co- occupied by RNA polymerase II phosphorylated on Serine-5
- RNAPII-S5p is most abundant at actively transcribed genes, and in particular at their promoters (Fig. 12A). To confirm that the promoters of genes more highly bound by RNAPII-S5p are also more frequently detected in GAM-chIP samples, 5kb windows overlapping gene promoters were identified and sorted into five equal groups (quantiles) according to the occupancy of RNAPII- S5p (as determined by ChlP-seq, published dataset from Brookes et al. 2012; Fig. 15C). As expected, the detection frequency of 5 kb windows that overlap gene promoters (also called transcription start sites or TSSes) increases with increased chromatin occupancy of RNAPII-S5p.
- TSSes transcription start sites
- RNAPII-S5p The TSS-over lapping 5 kb windows with the lowest binding of RNAPII-S5p are detected in 4.4% of GAM-chIP samples on average, whereas those windows with the highest binding are detected in an average of 12.5% of GAM-chIP samples (Fig. 15C).
- Future experiments will include the use of larger DNA fragment amounts per sample, to reach detection of genomic windows most abundantly occupied by RNAPII-S5p closer to the optimal 0.5 frequency of detection of each fragment, which will provide optimal chromatin contact information from the least number of samples (as expected from linear HAPPY Mapping).
- GAM-chIP One possible use for GAM-chIP is to identify enhancers regulating the expression of given genes.
- RNAPII-S5p is expected to be found at transcriptionally expressed genes and enhancers but not transcriptionally silent genes (Fig. 12A,B), and was therefore chosen as a suitable target for the exploratory GAM-chIP experiment in order to increase the potential to identify interactions within and between enhancers and active genes.
- the use of different proteins for immunoprecipitation may yield optimal co-segregation of promoters and their target enhancers.
- mice genes were ranked according to their expression level, as determined by mPvNA-seq. The top 25% of genes were selected as most actively transcribed genes, whilst the bottom 25% of genes was selected as transcriptionally silent genes. 5 kb windows were identified that overlapped the gene body, transcription start site (TSS) or transcription end site (TES) of genes in the top or bottom 25% by expression.
- TSS transcription start site
- TES transcription end site
- the percentage of 5 kb windows overlapping each feature that were identified as positive was plotted for each of the 182 GAM-chIP samples and compared to the percentage of all 5 kb windows or of 5 kb windows overlapping enhancers detected as positive in each sample (Fig. 15D).
- 5 kb windows overlapping the gene body, TSS or TES of a silent gene were detected slightly less frequently than the average for all 5 kb windows.
- chromatin contacts can form within the bodies of actively transcribed genes (Larkin, Cook & Papantonis, 2012). This means that distant regions within the same gene should be crosslinked both to each other and to R APII-S5p.
- GAM-chIP identifies the presence or absence of genomic loci across a collection of tubes. If actively transcribed genes interact with themselves during transcription, some tubes will contain many chromatin fragments derived from the same gene, which were crosslinked to each other during the fixation step. Alternatively, if actively transcribed genes do not interact with themselves, a smaller number of tubes will contain multiple windows from the same gene by chance alone.
- GAM-chIP detects co-association of actively transcribed genes with nearest candidate enhancer regions
- Genomic windows overlapping enhancers should therefore co-segregate in the same GAM-chIP samples as the genomic windows overlapping their target genes. Furthermore, since different parts of each gene also contact themselves during transcription, GAM-chIP samples containing multiple positive windows from the same gene are the most likely to have originated from the gene during its transcription cycle and therefore likely to additionally co-segregate with the enhancer.
- GAM-chIP samples For each gene, we ordered the GAM-chIP samples according to the proportion of intragenic windows detected. GAM-chIP samples which contain many positive windows from the same active gene often also contain a nearby enhancer, whereas GAM-chIP samples containing few positive windows from the same gene are often less likely to additionally contain the enhancer (Fig. 17A). In contrast, this behaviour is not expected for silent genes, since these genes are not expected to contact nearby regions classified as enhancers in mouse ESCs. For silent genes, the detection of a nearby enhancer is often uncorrected to the detection of the gene itself (Fig. 17B). With a larger collection of GAM-chIP samples each produced from fragment frequencies closer to 0.5, it should be possible to assign enhancers to their target genes based on the correlation of detection of the enhancer with detection of the gene across the collection of samples.
- GAM-ch samples with -0.2 and 10 genomes were subjected to WGA and detected by next- generation sequencing.
- the sequencing profile of the GAM-ch-0.2 sample has distinct islands across the genome whereas linear DNA at high concentration is evenly distributed (Fig. 6).
- the sequencing profile of -0.2 genomes suggests that only a sub-fraction of the genome is captured, which is then frequently sequenced, as expected (Fig. 8B).
- the threshold of signal detection of positive windows above background was 13 reads (-940 nts) for 4 kb windows, resulting in 45xl0 3 -50xl0 3 windows of 4 kb passing the threshold (Fig. 10).
- 45xl0 3 -50xl0 3 windows of 4 kb correspond to a total of 1.8xl0 8 -2xl0 8 nts (out of 2.6xl0 9 bp in the total mouse genome including repetitive sequences). If -0.2 genomes are dispensed across tubes, each molecule has a probability of 0.18 to be present in each tube assuming a Poisson distribution, which would correspond to ⁇ 4.7xl0 8 bp.
- Identifying contacts between active genes and their regulatory regions is a major current challenge, especially as there is evidence for complex interactions between clustered enhancers and their target genes (Fig. 11 A).
- 3C-based technologies underestimate contacting partners of most complex interactions (i.e. interactions involving three or more fragments; O'Sullivan et al. 2013; Fig. 1).
- FISH in interphase nuclei is limited by sensitivity of detection which requires that probes cover several kilobase pairs of genomic sequence, and by spatial resolution, which is limited to detect interactions between genomic sequences separated by several tens of kilobase pairs.
- Novel ligation-free technologies should help detect enhancers that participate in the most complex interactions (Fig. 11B).
- GAM-chIP after R APII-S5p ChIP can be performed reliably for different amounts of DNA, especially for 1 pg of DNA yielding GAM-chIP libraries with low complexity (2-10% of detection of 5 kb genomic windows; Fig. 13, 14, 15).
- the GAM-chIP libraries produced were enriched for genomic windows containing active genes, including windows covering the gene promoters (TSS) and the gene termination sites (TES) (Fig. 15C,D). 5kb genomic windows containing candidate enhancers were also more likely to be detected in the pool of positive windows in each GAM-chIP dataset (Fig. 15D), consistent with the presence of RNAPII-S5p at these regulatory regions.
- Murine fetal liver and fetal brain were dissected from El 4.5 wildtype mouse embryos as described previously (Hagege et al. 2007) and processed in parallel for 3C and GAM-ch. The quality of the resulting 'chromatin' preparation was determined using a chromosome conformation capture (3C)-qPCR assay, performed as previously described (Hagege et al. 2007), on the mouse ⁇ -globin gene cluster as a reference locus.
- 3C chromosome conformation capture
- Mouse fetal liver and brain tissue from 14.5 dpc embryos were dissected and processed into a single cell suspension as previously described (Hagege et al. 2007), resulting in a single-cell sample containing approximately 2xl0 7 cells/mL in 10% (v/v) heat inactivated fetal calf serum in PBS.
- Cells were fixed by addition of 2% formaldehyde/ 10%> FCS/PBS and incubated for 5 or 10 min at room temperature. The crosslinking reaction was then quenched by addition of 1 M glycine solution to give 0.14 M final concentration.
- restriction enzyme buffer 500 ⁇ ; NEB2 buffer
- 20%> (w/v) SDS solution 7.5 ⁇ was added to a final concentration of 0.3%, and incubated (1 h shaking at 900 rpm) to increase chromatin accessibility for restriction enzyme digestion.
- 50 ⁇ , of 20%> Triton X-100 solution were added (2% final concentration) and incubated at 37°C (1 h shaking) to sequester SDS.
- Hindlll 400 units; BioLab
- digestion was performed overnight (37°C, shaking) followed by addition of 40 ⁇ ⁇ 20% SDS solution (1.6% final concentration) and incubation at 65°C (20 min) to inactivate Hindlll.
- Aliquots of undigested and digested chromatin were taken for subsequent analysis of digestion efficiency.
- the digested nuclei were transferred to a 50 mL Falcon tube and diluted in 6.125 mL of ligation buffer (66 mMTris-HCl, pH 7.5; 5 mM DTT; 5 mM MgCl 2 ; 1 mM ATP). After addition of 375 ⁇ . of 20% (v/v) Triton X-100 solution (1% final concentration), nuclei were incubated (1 h shaking at 37°C). T4 DNA ligase (Promega) was added (100 Units) and ligation was performed at 16°C for 4 h.
- Reversal of crosslinks was performed by addition of 30 ⁇ of 10 mg/mL proteinase K (300 ⁇ g total; Sigma) and incubation at 65°C overnight followed by RNase incubation (300 ⁇ g total; Roche) at 37°C (1 h), and by phenol-chloroform extraction and ethanol precipitation (Sigma).
- the 3C material was desalted using Micro Bio-Spin P-30 chromatography columns (BioRad) before qPCR. Each qPCR reaction was performed with -120 ng of 3C material.
- Quantitative real-time PCR (MJ MiniOpticon, BioRad) was performed with Platinum Taq DNA Polymerase (Invitrogen) and double-dye oligonucleotides (5'FAM + 3'TAMRA) as TaqMan probes, using the following concentrations: 0.1 ⁇ LTaq-polymerase from kit; 2.5 ⁇ , lOxTaq-buffer from kit; 0.75 ⁇ MgCl 2 (final 1.5 mM) from kit; 0.5 ⁇ (final 200 ⁇ ); 0.25 ⁇ , of each primer (from stock solution of 0.29 ⁇ g/ ⁇ L); 0.025 ⁇ LTaq-probe (final 2.5 pmol); 1-2 ⁇ , DNA template and adjusting to 25 ⁇ , with H 2 0.
- a real-time qPCR (95°C for 10 min, 40 cycles with 95°Cfor 30 seconds, 58°C or 15 seconds and 72°C for 15 seconds) with Syb R Green as performed with the undigested (UND) and digested (D) samples using 2xPCR mix (Promega) on the MJ MiniOpticon PCR engine (BioRad).
- primer sets that amplify across each restriction site of interest (R) were used.
- internal primers (C) not containing a restriction site were used.
- Preparation of crosslinked nuclei from mouse fetal liver cells for GAM-ch is similar as for 3C. Briefly, fetal liver cells were resuspended in 2% formaldehyde/ 10% FCS/PBS and the reaction was quenched with glycine after 10 min. Fixed cells were lysed in cold lysis buffer, and nuclei were spun as for 3C (as described above).
- sonication buffer 50 mM HEPES pH 7.9, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na- deoxycholate, 0.1%) SDS
- sonication buffer 50 mM HEPES pH 7.9, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na- deoxycholate, 0.1%) SDS
- Nuclei were sonicated in 2.5 mL aliquots using a Bioruptor (Diagenode) for 30 min at 30 s on/off intervals at medium energy.
- mouse fetal liver cells were embedded into DNA agarose strings at a density of ⁇ lxl0 7 cells/mL ( ⁇ 2xl0 5 genomes/cm; prepared according to Dear D.H. et al. 1998. A high-resolution metric HAPPY map of human chromosome 14. Genomics 48:232). Agarose strings of distinct length were melted in 0.5x PCR buffer II (68°C, 10 min) and DNA was diluted in molecular biology-grade H 2 0 (Sigma) into aliquots of -100 genomes ⁇ L and stored at -20°C.
- ESCs Mouse embryonic stem cells (ESCs; 46C cell line, male) were grown in ESGRO medium (Merck, SF001-500P) supplemented by 1000 units/ml LIF (Merck), and chromatin prepared as previously described (Stock et al, 2007). Briefly, cells were treated with 1% formaldehyde (37°C, 10 min) and the reaction stopped with addition of glycine to a final concentration of 0.125 M. Cells were washed in ice-cold PBS, before "swelling" buffer (25 mM HEPES pH 7.9, 1.5 mM MgC12, 10 mM KC1 and 0.1% NP-40) was added to lyse the cells (10 min, 4°C).
- ESGRO medium Merck, SF001-500P
- LIF Merck
- chromatin prepared as previously described (Stock et al, 2007). Briefly, cells were treated with 1% formaldehyde (37°C, 10 min) and the reaction stopped
- Protein-G-magnetic beads were first incubated with rabbit anti-mouse (IgG+IgM) bridging antibodies (Jackson Immunoresearch; 10 ⁇ g per 50 ⁇ beads) for 1 h at 4°C and washed with sonication buffer. Seven hundred ⁇ g of chromatin was immunoprecipitated (4°C, overnight) with 10 ⁇ g of RNAPII-S5p antibody (clone CTD-4H8, Covance) and 50 ⁇ magnetic beads beads. ChIP washes and elutions after immunoprecipitation were performed as described previously (Stock et al, 2007).
- crosslinked DNA-protein complexes were eluted twice from beads (65°C, 5 min; and room temperature, 15 min) with 50 mM Tris-HCl pH 8.0, 1 mM EDTA and 1% SDS.
- Half of the eluted immunoprecipitated chromatin was diluted into multiple tubes (based on the measured DNA concentration in the other half of eluted chromatin).
- To measure DNA concentration half of the eluted chromatin was incubated overnight at 65 °C with addition of NaCl (160 mM final concentration) and RNase A (20 ⁇ g/ml; Sigma) to reverse cross- linking.
- Oct4 promoter F GGCTCTCC AGAGGATGGCTGAG (SEQ ID NO : 1 )
- Oct4 promoter R TCGGATGCCCCATCGCA (SEQ ID NO: 2)
- Nkx2.2 promoter F CAGGTTCGTGAGTGGAGCCC (SEQ ID NO: 5)
- Nkx2.2 promoter R GCGCGGCCTC AGTTTGTAAC (SEQ ID NO : 6)
- HoxA7 promoter R CCGACAACCTCATACCTATTCCTG (SEQ ID NO: 10)
- Illumina libraries were prepared for HT sequencing from WGA-amplified GAM-ch DNA.
- WGA-amplified GAM-ch samples were fragmented using a Covaris shearing system before library preparation.
- Illumina libraries were size selected on agarose gels, enabling visualisation of the amplified DNA fragments, and therefore more careful extraction of appropriate sized fragments.
- QIAgen Gel Extraction kit libraries were quantified by QuBit (Invitrogen) and qPCR, and library size was analysed by Bioanalyser (Agilent). Fragment sizes were within the expected size distribution of 210-600 bp (including adapters) for all libraries.
- RNAPII-S5p Chromatin precipitated with antibodies against RNAPII-S5p was quantified fluorimetrically with PicoGreen (Molecular Probes, Invitrogen) and diluted into multiple tubes (see Table 2 for amounts).
- DNA was extracts by WGA, first by incubation in WGA fragmentation buffer containing PK for 2 h (Exp.001 and Exp.002) or 8 h (Exp.003); subsequent steps were carried out according to the manufacturer's specifications.
- Amplified DNA was purified with MinElute 96 UF PCR Purification Kit (Qiagen) according to manufacturer's instructions.
- DNA fragments from 300-500 bp were size-selected with Agencourt AMPure XP (Beckman Coulter) and the final DNA concentration was determined by PicoGreen fluorimetry (Molecular Probes, Invitrogen) and subjected to Illumina TruSeq Nano library preparation (GAM-chIP ExpOOl, GAM-chIP Exp002; Table 2) or to Illumina Nextera XT library preparation (GAM-chIPExp003 ; Table 2).
- GAM-ch libraries (4-12 pM) were loaded onto the Genome Analyser flow cell.
- the single- stranded DNA fragments bind randomly across the surface of the flow cell due to hybridisation between the adaptor sequences added to DNA ends during library preparation, and the oligonucleotides that coat the flow cell.
- Polymerase-based extension converts each fragment to a cluster of approximately 1000 identical fragments.
- the amount and size of DNA fragments loaded on to the flow cell was optimised to obtain the highest number of non-overlapping clusters following cluster generation.
- Clusters were then sequenced by synthesis, using adaptor- specific primers and incorporation of fluorescent nucleotides. Digital images were taken at each round of nucleotide incorporation and the unique fluorescent signal assigned to each nucleotide enables its correct identification. Sequential images of a given cluster therefore represent the fragment sequence.
- GAM-chIP libraries sequenced on the HiSeq or MiSeq were not imaged for the first thirty sequencing cycles (known as dark cycles) in order to avoid issues relating to low sequence diversity in the WGA adaptor. This step avoids the need for trimming reads after sequencing used in earlier GAM-ch datasets (Fig. 5A).
- DNA reads were firstly aligned to the reference mouse genome (assembly mm9) using Illumina Extended software (pipeline 1.6) allowing only for two mismatches at most and unique matches only. Un-aligned reads were then trimmed at their 5 ' or 3 ' end and aligned to the mm9 genome using Bowtie software, version 0.9.8.1 (Langmead B. et al. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, R25).
- DNA reads were first aligned to the reference mouse genome (assembly mm 10) using Bowtie2 and enforcing a minimum mapping quality of 20. Read depth of coverage was calculated using bedtoolsmultibamcov (Quinlan & Hall 2010, BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:6). Curve fitting was performed in python using the fmin function from scipy. A combination of two distributions was fitted to the histogram of the number of reads per window.
- a negative binomial distribution represents sequencing noise, and the parameters of the fit for this distribution were used to determine a threshold number of reads X where the probability of observing more than X reads mapping to a single genomic window by chance was less than 0.001. Such a threshold was thus independently determined for each sample, and windows were scored as positive if the number of sequenced reads was greater than the determined threshold.
- a lognormal distribution representing true signal
- positive windows were also called using JAMM (Ibrahim et al, 2015) in the peak mode with default settings.
- ChlP-seq libraries for R APII-S5p and control (using non-specific antibody against plant steroid digoxigenin) were prepared from 10 ng of immunoprecipitated DNA (as measured by Picogreen quantification) with corresponding antibodies using the Next ChlP-Seq library Prep Master Mix Set from Illumina (NEB, # E6240) following the NEB protocol, with some modifications.
- the intermediate products from the different steps of the NEB protocol were purified using MiniElute PCR purification kit (Qiagen, # 28004).
- Adaptors, PCR amplification primers and indexing primers were from the Multiplexing Sample Preparation Oligonucleotide Kit (Illumina, # PE-400-1001).
- Samples were PCR amplified prior to size selection of DNA fragments (250- 600bp) on an agarose gel. After purification by QIAquick Gel Extraction kit (Qiagen, # 28704), libraries were quantified by qPCR using Kapa Library Quantification Universal Kit (KapaBio systems, #KK4824). Library size distribution was assessed by 2100 Bioanalyzer (Agilent) with High Sensitivity DNA analysis Kit (Agilent, #5067-4626) before high-throughput sequencing. Libraries were quantified by Qubit and sequenced on Illumina HiSeq2000 (single- end sequencing, 51 nucleotides), according to the manufacturer's instructions.
- Sequenced reads were aligned to the mouse genome (assembly mmlO, December 2011) using Bowtie2 version 2.0.5 (Langmead and Salzberg, 2012), with default parameters. Duplicated reads (i.e. identical reads, aligned to the same genomic location) occurring more often than a threshold were removed. The threshold is computed for each dataset as the 95th percentile of the frequency distribution of reads.
- RNAPII-S5p and control ChIP enrichment at enhancers the list of enhancers from Whyte et al. 2013 was used.
- TPM Transcripts per Million
- Genes in the top 25% by expression were classified as active, whilst genes in the bottom 25% by expression were classified as silent.
- paired-end (2xl00bp) reads from mRNA-seq were aligned against the mouse genome using STAR (Spliced Transcripts Alignment to a Reference, v2.4.2a, (Dobin et al, 2013) and expression levels were estimated in TPM with RSEM (RNA-Seq by Expectation-Maximization, vl .2.25 (Li and Dewey, 2011).
- the reference for STAR and RSEM was produced from the Mouse Genome version mmlO, providing the gtf annotation from UCSC Known Genes (mmlO, version 6) and associated isoform-gene relationship information from the Known Isoforms table. Both tables were downloaded from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables).
- the detection frequency of each window overlapped by the gene, ⁇ one window upstream/downstream, was calculated as the number of GAM-chIP samples in which the window was detected divided by the total number of GAM-chIP samples. Since each window is detected with a different frequency, each window can be described by its own binomial distribution.
- the expected distribution of the number of positive windows from the same gene detected simultaneously in a single GAM-chIP sample was calculated as the convolution of the binomial distributions for each component window.
- the average expected number of positive windows per GAM-chIP sample was calculated as the sum of the window detection frequencies. For each gene, the number of tubes with more than double this average was counted and compared to the expected number of tubes with more than double the average. The distribution of observed vs. expected values was plotted and compared between active genes and silent genes.
- Enhancer loops appear stable during development and are associated with paused polymerase. Nature. 2014 Aug 7;512(7512):96-100.
- Chromatin Interaction Analysis with Paired-End Tag (ChlA-PET) sequencing technology and application. BMC Genomics. 2014;15 Suppl 12:S11.
- CTCF mediates long-range chromatin looping and local histone modification in the beta- globin locus.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
La présente invention concerne le domaine de l'analyse de la structure tridimensionnelle du génome, c'est-à-dire, la cartographie d'architecture du génome sur chromatine (GAM-ch). L'invention concerne un procédé permettant de déterminer l'interaction d'une pluralité de loci d'acides nucléiques dans un compartiment comprenant des acides nucléiques, comme le noyau de la cellule, consistant à séparer des acides nucléiques les uns des autres en fonction de leur interaction dans le compartiment par réticulation d'acides nucléiques les uns avec les autres directement ou indirectement, fragmenter les acides nucléiques du compartiment pour obtenir des fragments et/ou des complexes de fragments réticulés, et diviser les acides nucléiques fragmentés pour obtenir une collection de fractions de telle sorte que chaque fraction contienne, en moyenne, moins d'une copie de chaque locus ; déterminer la présence ou l'absence de la pluralité de loci dans lesdites fractions ; et déterminer la co-ségrégation de ladite pluralité de loci dans les fractions. La co-ségrégation peut alors être analysée avec des méthodes statistiques pour déterminer les interactions. Le procédé peut être utilisé par exemple, pour identifier la fréquence d'interactions sur une population de cellules entre une pluralité de loci ; et cartographier l'architecture des loci et/ou du génome, par exemple, dans le noyau, un organite, un micro-organisme ou un virus ; identifier des régions régulatrices dirigeant l'expression d'un gène spécifique par l'intermédiaire de contacts spatiaux ; identifier les contacts spatiaux entre les loci qui dépendent de leur co-association avec une/des protéine(s) spécifique(s) ou l'ARN et/ou diagnostiquer une maladie associée à une co-ségrégation perturbée de loci. L'immunoprécipitation de la chromatine (ChIP) peut être combinée avec le procédé de l'invention.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP15161949 | 2015-03-31 | ||
EP15161949.1 | 2015-03-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016156469A1 true WO2016156469A1 (fr) | 2016-10-06 |
Family
ID=52811014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2016/057025 WO2016156469A1 (fr) | 2015-03-31 | 2016-03-31 | Cartographie d'architecture de génome sur chromatine |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2016156469A1 (fr) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018045137A1 (fr) * | 2016-09-02 | 2018-03-08 | Ludwig Institute For Cancer Research Ltd | Identification d'interactions de chromatine à l'échelle du génome |
CN111727248A (zh) * | 2017-09-25 | 2020-09-29 | 弗雷德哈钦森癌症研究中心 | 高效靶向原位全基因组剖析 |
CN112599189A (zh) * | 2020-12-29 | 2021-04-02 | 北京优迅医学检验实验室有限公司 | 一种全基因组测序的数据质量评估方法及其应用 |
EP3988669A1 (fr) | 2020-10-22 | 2022-04-27 | Max-Delbrück-Centrum für Molekulare Medizin in der Helmholtz-Gemeinschaft | Procédé de detection d'acides nucléiques par oligo hybridation et amplification à base de pcr |
CN114842914A (zh) * | 2022-04-24 | 2022-08-02 | 山东大学 | 一种基于深度学习的染色质环预测方法及系统 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100081141A1 (en) * | 2008-08-06 | 2010-04-01 | University Of Southern California | Genome-Wide Chromosome Conformation Capture |
WO2012159025A2 (fr) * | 2011-05-18 | 2012-11-22 | Life Technologies Corporation | Analyse de conformation de chromosome |
-
2016
- 2016-03-31 WO PCT/EP2016/057025 patent/WO2016156469A1/fr active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100081141A1 (en) * | 2008-08-06 | 2010-04-01 | University Of Southern California | Genome-Wide Chromosome Conformation Capture |
WO2012159025A2 (fr) * | 2011-05-18 | 2012-11-22 | Life Technologies Corporation | Analyse de conformation de chromosome |
Non-Patent Citations (5)
Title |
---|
ANA POMBO ET AL: "Three-dimensional genome architecture: players and mechanisms", NATURE REVIEWS MOLECULAR CELL BIOLOGY, vol. 16, no. 4, 11 March 2015 (2015-03-11), pages 245 - 257, XP055207128, ISSN: 1471-0072, DOI: 10.1038/nrm3965 * |
DEAR P H ET AL: "HAPPY MAPPING: A PROPOSAL FOR LINKAGE MAPPING THE HUMAN GENOME", NUCLEIC ACIDS RESEARCH, INFORMATION RETRIEVAL LTD, vol. 17, no. 17, 12 September 1989 (1989-09-12), pages 6795 - 6807, XP000371654, ISSN: 0305-1048 * |
JENNIFER L CRUTCHLEY ET AL: "Chromatin conformation signatures: ideal human disease biomarkers?", BIOMARKERS IN MEDICINE, vol. 4, no. 4, 1 August 2010 (2010-08-01), pages 611 - 629, XP055155789, ISSN: 1752-0363, DOI: 10.2217/bmm.10.68 * |
PHILIPPE COLLAS: "The Current State of Chromatin Immunoprecipitation", MOLECULAR BIOTECHNOLOGY, vol. 45, no. 1, 1 May 2010 (2010-05-01), pages 87 - 100, XP055021496, ISSN: 1073-6085, DOI: 10.1007/s12033-009-9239-8 * |
TOLHUIS B ET AL: "Looping and interaction between hypersensitive sites in the active beta-globin locus", MOLECULAR CELL, CELL PRESS, CAMBRIDGE, MA, US, vol. 10, no. 6, 1 December 2002 (2002-12-01), pages 1453 - 1465, XP002301469, ISSN: 1097-2765, DOI: 10.1016/S1097-2765(02)00781-5 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018045137A1 (fr) * | 2016-09-02 | 2018-03-08 | Ludwig Institute For Cancer Research Ltd | Identification d'interactions de chromatine à l'échelle du génome |
CN111727248A (zh) * | 2017-09-25 | 2020-09-29 | 弗雷德哈钦森癌症研究中心 | 高效靶向原位全基因组剖析 |
EP3988669A1 (fr) | 2020-10-22 | 2022-04-27 | Max-Delbrück-Centrum für Molekulare Medizin in der Helmholtz-Gemeinschaft | Procédé de detection d'acides nucléiques par oligo hybridation et amplification à base de pcr |
WO2022084528A1 (fr) | 2020-10-22 | 2022-04-28 | Max-Delbrück-Centrum Für Molekulare Medizin In Der Helmholtz-Gemeinschaft | Procédé destiné à la détection d'acide nucléique par hybridation des oligos et amplification basée sur la pcr |
CN112599189A (zh) * | 2020-12-29 | 2021-04-02 | 北京优迅医学检验实验室有限公司 | 一种全基因组测序的数据质量评估方法及其应用 |
CN114842914A (zh) * | 2022-04-24 | 2022-08-02 | 山东大学 | 一种基于深度学习的染色质环预测方法及系统 |
CN114842914B (zh) * | 2022-04-24 | 2024-04-05 | 山东大学 | 一种基于深度学习的染色质环预测方法及系统 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7127104B2 (ja) | 連続性を維持した転位 | |
EP3334823B1 (fr) | Procédé et kit pour générer des arn guide crispr/cas | |
KR102425438B1 (ko) | 서열결정에 의해 평가된 DSB의 게놈 전체에 걸친 비편향된 확인 (GUIDE-Seq) | |
CN107586835B (zh) | 一种基于单链接头的下一代测序文库的构建方法及其应用 | |
Galupa et al. | A conserved noncoding locus regulates random monoallelic Xist expression across a topological boundary | |
JP2022095676A (ja) | 保存されたサンプルからの長距離連鎖情報の回復 | |
US20200248229A1 (en) | Unbiased detection of nucleic acid modifications | |
US11807896B2 (en) | Physical linkage preservation in DNA storage | |
WO2016156469A1 (fr) | Cartographie d'architecture de génome sur chromatine | |
US10526639B2 (en) | Genome architecture mapping | |
EP3746566A1 (fr) | Préparation d'échantillon pour récupération de liaison d'adn | |
US20220136041A1 (en) | Off-Target Single Nucleotide Variants Caused by Single-Base Editing and High-Specificity Off-Target-Free Single-Base Gene Editing Tool | |
WO2014193980A1 (fr) | Amplification pratiquement non biaisée de génomes | |
Velychko et al. | CDK7 kinase activity promotes RNA polymerase II promoter escape by facilitating initiation factor release | |
WO2021119550A1 (fr) | Procédé de détermination d'une architecture de génome 3d avec une résolution de paire de base et utilisations supplémentaires associées | |
Pinglay et al. | Synthetic genomic reconstitution reveals principles of mammalian Hox cluster regulation | |
Goldberg et al. | Engineered transcription-associated Cas9 targeting in eukaryotic cells | |
US20240150830A1 (en) | Phased genome scale epigenetic maps and methods for generating maps | |
Grillo et al. | ZBTB24 is a conserved multifaceted transcription factor at genes and centromeres that governs the DNA methylation state and expression of satellite repeats | |
Lin et al. | DNA sequence preference for de novo centromere formation on a Caenorhabditis elegans artificial chromosome | |
Liu et al. | De novo assembly and delivery of synthetic megabase-scale human DNA into mouse early embryos | |
US20180087089A1 (en) | Method for Analysing Nuclease Hypersensitive Sites | |
US20240287609A1 (en) | Compositions and methods for large-scale in vivo genetic screening | |
Javed et al. | Introduction to Molecular Genomics | |
Herbst | Scalable approaches for gene tagging and genome walking sequencing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16712365 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16712365 Country of ref document: EP Kind code of ref document: A1 |