WO2024222158A1 - Targeted high-throughput sequencing method for detecting splicing isoform - Google Patents
Targeted high-throughput sequencing method for detecting splicing isoform Download PDFInfo
- Publication number
- WO2024222158A1 WO2024222158A1 PCT/CN2024/077716 CN2024077716W WO2024222158A1 WO 2024222158 A1 WO2024222158 A1 WO 2024222158A1 CN 2024077716 W CN2024077716 W CN 2024077716W WO 2024222158 A1 WO2024222158 A1 WO 2024222158A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- primer
- gene
- sequencing
- splicing
- sequence
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 78
- 238000012165 high-throughput sequencing Methods 0.000 title claims abstract description 56
- 108010029485 Protein Isoforms Proteins 0.000 title claims abstract description 55
- 102000001708 Protein Isoforms Human genes 0.000 title claims abstract description 55
- 238000006243 chemical reaction Methods 0.000 claims abstract description 121
- 238000012163 sequencing technique Methods 0.000 claims abstract description 87
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 80
- 238000010839 reverse transcription Methods 0.000 claims abstract description 63
- 239000002299 complementary DNA Substances 0.000 claims abstract description 29
- 238000012408 PCR amplification Methods 0.000 claims abstract description 26
- 239000012634 fragment Substances 0.000 claims abstract description 26
- 108091034117 Oligonucleotide Proteins 0.000 claims abstract description 17
- 239000007795 chemical reaction product Substances 0.000 claims abstract description 14
- 238000012986 modification Methods 0.000 claims abstract description 7
- 230000004048 modification Effects 0.000 claims abstract description 7
- 150000007523 nucleic acids Chemical class 0.000 claims description 37
- 239000000047 product Substances 0.000 claims description 28
- 108020004707 nucleic acids Proteins 0.000 claims description 27
- 102000039446 nucleic acids Human genes 0.000 claims description 27
- 230000000295 complement effect Effects 0.000 claims description 21
- 108700024394 Exon Proteins 0.000 claims description 16
- 230000000717 retained effect Effects 0.000 claims description 14
- 108091092195 Intron Proteins 0.000 claims description 12
- 239000002773 nucleotide Substances 0.000 claims description 12
- 125000003729 nucleotide group Chemical group 0.000 claims description 12
- 238000004445 quantitative analysis Methods 0.000 claims description 8
- 150000001345 alkine derivatives Chemical class 0.000 claims description 7
- 230000008685 targeting Effects 0.000 claims description 5
- 238000011144 upstream manufacturing Methods 0.000 claims description 5
- -1 UQCC5 Proteins 0.000 claims description 4
- 101000975533 Homo sapiens JmjC domain-containing protein 8 Proteins 0.000 claims description 3
- 102100023958 JmjC domain-containing protein 8 Human genes 0.000 claims description 3
- 102100037710 40S ribosomal protein S21 Human genes 0.000 claims description 2
- 102100029160 ATP-dependent (S)-NAD(P)H-hydrate dehydratase Human genes 0.000 claims description 2
- 102100022936 ATPase inhibitor, mitochondrial Human genes 0.000 claims description 2
- 102100036168 CXXC-type zinc finger protein 1 Human genes 0.000 claims description 2
- 102100039911 Endoplasmic reticulum transmembrane helix translocase Human genes 0.000 claims description 2
- 102100030881 Enoyl-CoA hydratase domain-containing protein 2, mitochondrial Human genes 0.000 claims description 2
- 102100026149 Fibroblast growth factor receptor-like 1 Human genes 0.000 claims description 2
- 102100032820 HIG1 domain family member 2A, mitochondrial Human genes 0.000 claims description 2
- 102100035617 Heterogeneous nuclear ribonucleoprotein A/B Human genes 0.000 claims description 2
- 102100031336 High mobility group nucleosome-binding domain-containing protein 3 Human genes 0.000 claims description 2
- 101001097814 Homo sapiens 40S ribosomal protein S21 Proteins 0.000 claims description 2
- 101001124829 Homo sapiens ATP-dependent (S)-NAD(P)H-hydrate dehydratase Proteins 0.000 claims description 2
- 101000902767 Homo sapiens ATPase inhibitor, mitochondrial Proteins 0.000 claims description 2
- 101000947157 Homo sapiens CXXC-type zinc finger protein 1 Proteins 0.000 claims description 2
- 101000887230 Homo sapiens Endoplasmic reticulum transmembrane helix translocase Proteins 0.000 claims description 2
- 101000919883 Homo sapiens Enoyl-CoA hydratase domain-containing protein 2, mitochondrial Proteins 0.000 claims description 2
- 101000912518 Homo sapiens Fibroblast growth factor receptor-like 1 Proteins 0.000 claims description 2
- 101001066452 Homo sapiens HIG1 domain family member 2A, mitochondrial Proteins 0.000 claims description 2
- 101000854036 Homo sapiens Heterogeneous nuclear ribonucleoprotein A/B Proteins 0.000 claims description 2
- 101000866771 Homo sapiens High mobility group nucleosome-binding domain-containing protein 3 Proteins 0.000 claims description 2
- 101001049220 Homo sapiens Kelch-like protein 17 Proteins 0.000 claims description 2
- 101001038435 Homo sapiens Leucine-zipper-like transcriptional regulator 1 Proteins 0.000 claims description 2
- 101000822604 Homo sapiens Methanethiol oxidase Proteins 0.000 claims description 2
- 101000598806 Homo sapiens Probable tRNA N6-adenosine threonylcarbamoyltransferase Proteins 0.000 claims description 2
- 101000706678 Homo sapiens Proteasome subunit beta type-1 Proteins 0.000 claims description 2
- 101001066450 Homo sapiens Putative HIG1 domain family member 2B Proteins 0.000 claims description 2
- 101000687718 Homo sapiens SWI/SNF complex subunit SMARCC1 Proteins 0.000 claims description 2
- 102100023684 Kelch-like protein 17 Human genes 0.000 claims description 2
- 102100040274 Leucine-zipper-like transcriptional regulator 1 Human genes 0.000 claims description 2
- 102100022465 Methanethiol oxidase Human genes 0.000 claims description 2
- 108010018525 NFATC Transcription Factors Proteins 0.000 claims description 2
- 102000002673 NFATC Transcription Factors Human genes 0.000 claims description 2
- 102100024312 PEST proteolytic signal-containing nuclear protein Human genes 0.000 claims description 2
- 101710130510 PEST proteolytic signal-containing nuclear protein Proteins 0.000 claims description 2
- 102100037775 Probable tRNA N6-adenosine threonylcarbamoyltransferase Human genes 0.000 claims description 2
- 102100031566 Proteasome subunit beta type-1 Human genes 0.000 claims description 2
- 102100032811 Putative HIG1 domain family member 2B Human genes 0.000 claims description 2
- 102100024793 SWI/SNF complex subunit SMARCC1 Human genes 0.000 claims description 2
- 125000000304 alkynyl group Chemical group 0.000 abstract 1
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 35
- 239000000523 sample Substances 0.000 description 31
- 230000008569 process Effects 0.000 description 23
- 239000000203 mixture Substances 0.000 description 22
- 210000004027 cell Anatomy 0.000 description 21
- 238000010276 construction Methods 0.000 description 21
- 238000001514 detection method Methods 0.000 description 21
- 108020004999 messenger RNA Proteins 0.000 description 20
- 238000013518 transcription Methods 0.000 description 12
- 230000035897 transcription Effects 0.000 description 12
- 238000002360 preparation method Methods 0.000 description 11
- 108020004414 DNA Proteins 0.000 description 9
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 9
- 102100034343 Integrase Human genes 0.000 description 9
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 9
- 238000011529 RT qPCR Methods 0.000 description 8
- 239000010949 copper Substances 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 7
- 108091028043 Nucleic acid sequence Proteins 0.000 description 7
- 238000012650 click reaction Methods 0.000 description 7
- 229910052802 copper Inorganic materials 0.000 description 7
- 238000000338 in vitro Methods 0.000 description 7
- 238000007481 next generation sequencing Methods 0.000 description 7
- 238000011002 quantification Methods 0.000 description 7
- 238000010461 azide-alkyne cycloaddition reaction Methods 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 6
- 230000014759 maintenance of location Effects 0.000 description 6
- 238000003559 RNA-seq method Methods 0.000 description 5
- 239000003054 catalyst Substances 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 239000012264 purified product Substances 0.000 description 5
- 210000001324 spliceosome Anatomy 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- 230000009897 systematic effect Effects 0.000 description 5
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 4
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 4
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 4
- 230000009471 action Effects 0.000 description 4
- 239000003153 chemical reaction reagent Substances 0.000 description 4
- 238000000746 purification Methods 0.000 description 4
- 239000000376 reactant Substances 0.000 description 4
- 239000002904 solvent Substances 0.000 description 4
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 4
- 102000004190 Enzymes Human genes 0.000 description 3
- 108090000790 Enzymes Proteins 0.000 description 3
- 125000002355 alkine group Chemical group 0.000 description 3
- 230000003321 amplification Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical class NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical class O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 3
- 238000002156 mixing Methods 0.000 description 3
- 238000003199 nucleic acid amplification method Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 210000001519 tissue Anatomy 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 2
- CIWBSHSKHKDKBQ-JLAZNSOCSA-N Ascorbic acid Chemical compound OC[C@H](O)[C@H]1OC(=O)C(O)=C1O CIWBSHSKHKDKBQ-JLAZNSOCSA-N 0.000 description 2
- 101150063710 CXXC1 gene Proteins 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 2
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 2
- 101150086472 Echdc2 gene Proteins 0.000 description 2
- 101150017750 FGFRL1 gene Proteins 0.000 description 2
- 101150099453 Hmgn3 gene Proteins 0.000 description 2
- 101100492638 Homo sapiens ATP13A1 gene Proteins 0.000 description 2
- 101150051242 KLHL17 gene Proteins 0.000 description 2
- WCUXLLCKKVVCTQ-UHFFFAOYSA-M Potassium chloride Chemical compound [Cl-].[K+] WCUXLLCKKVVCTQ-UHFFFAOYSA-M 0.000 description 2
- 238000002123 RNA extraction Methods 0.000 description 2
- 230000006819 RNA synthesis Effects 0.000 description 2
- 108010006785 Taq Polymerase Proteins 0.000 description 2
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 2
- GFFGJBXGBJISGV-UHFFFAOYSA-N adenyl group Chemical class N1=CN=C2N=CNC2=C1N GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 2
- 150000001540 azides Chemical class 0.000 description 2
- 210000001124 body fluid Anatomy 0.000 description 2
- 239000010839 body fluid Substances 0.000 description 2
- 239000007853 buffer solution Substances 0.000 description 2
- SUYVUBYJARFZHO-RRKCRQDMSA-N dATP Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-N 0.000 description 2
- SUYVUBYJARFZHO-UHFFFAOYSA-N dATP Natural products C1=NC=2C(N)=NC=NC=2N1C1CC(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-UHFFFAOYSA-N 0.000 description 2
- RGWHQCVHVJXOKC-SHYZEUOFSA-J dCTP(4-) Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)C1 RGWHQCVHVJXOKC-SHYZEUOFSA-J 0.000 description 2
- HAAZLUGHYHWQIW-KVQBGUIXSA-N dGTP Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000012467 final product Substances 0.000 description 2
- 101150054551 osgep gene Proteins 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 239000002243 precursor Substances 0.000 description 2
- 102000004196 processed proteins & peptides Human genes 0.000 description 2
- 108090000765 processed proteins & peptides Proteins 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 238000004451 qualitative analysis Methods 0.000 description 2
- 238000003757 reverse transcription PCR Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 239000011780 sodium chloride Substances 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 229940113082 thymine Drugs 0.000 description 2
- 229930024421 Adenine Natural products 0.000 description 1
- 206010003445 Ascites Diseases 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- ZZZCUOFIHGPKAK-UHFFFAOYSA-N D-erythro-ascorbic acid Natural products OCC1OC(=O)C(O)=C1O ZZZCUOFIHGPKAK-UHFFFAOYSA-N 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 101000754924 Homo sapiens Ribosomal oxygenase 1 Proteins 0.000 description 1
- 108091026898 Leader sequence (mRNA) Proteins 0.000 description 1
- 101150019951 Lztr1 gene Proteins 0.000 description 1
- 101150009322 NAXD gene Proteins 0.000 description 1
- 208000002151 Pleural effusion Diseases 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 108010026552 Proteome Proteins 0.000 description 1
- 108091034057 RNA (poly(A)) Proteins 0.000 description 1
- 238000010802 RNA extraction kit Methods 0.000 description 1
- 238000013381 RNA quantification Methods 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 108091036066 Three prime untranslated region Proteins 0.000 description 1
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical class O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 1
- 229930003268 Vitamin C Natural products 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 210000004381 amniotic fluid Anatomy 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 210000000941 bile Anatomy 0.000 description 1
- 230000003592 biomimetic effect Effects 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 239000012154 double-distilled water Substances 0.000 description 1
- 238000012869 ethanol precipitation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 125000005842 heteroatom Chemical group 0.000 description 1
- 210000004251 human milk Anatomy 0.000 description 1
- 235000020256 human milk Nutrition 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000006317 isomerization reaction Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 108091070501 miRNA Proteins 0.000 description 1
- 239000002679 microRNA Substances 0.000 description 1
- 229940124276 oligodeoxyribonucleotide Drugs 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 108091033319 polynucleotide Proteins 0.000 description 1
- 102000040430 polynucleotide Human genes 0.000 description 1
- 239000002157 polynucleotide Substances 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 108020004418 ribosomal RNA Proteins 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 101150016765 selenbp1 gene Proteins 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 235000019154 vitamin C Nutrition 0.000 description 1
- 239000011718 vitamin C Substances 0.000 description 1
Definitions
- the present invention relates to the field of biotechnology, and in particular to a targeted high-throughput sequencing method for detecting splicing isoforms.
- AS Alternative splicing
- RNA splicing is an essential step for pre-mRNA to mature into mRNA. Under the action of the spliceosome, introns in pre-mRNA are removed and all exons and UTR sequences are connected. When alternative splicing occurs, the same pre-mRNA can generate two or more splice isomers. Alternative splicing can be summarized into five forms, of which the two most common ones are intron retention and exon skipping.
- intron retention means that introns are not removed during pre-mRNA splicing and are still retained on mRNA; exon skipping means that exons are removed by spliceosomes during pre-mRNA splicing, resulting in exon deletion in the subsequently generated mRNA.
- RNA library construction kits currently available on the market mainly use different fragmentation methods to construct total mRNA sequencing libraries, and there is no targeted enrichment library construction method for splicing isoforms, especially for unknown splicing isoforms.
- the main reason is that the classic RNA-seq is to randomly fragment the total RNA (total RNA) or messenger RNA (messenger RNA, mRNA), and then reverse transcribe it through random primers random hexamer and/or oligo (dT) primers to form double-stranded DNA and complete the subsequent adapter connection library construction.
- the above method only targets the whole transcriptome, and cannot target a small part of the transcriptome of interest for analysis, and the cost is high.
- RNA sequencing kits Since the fragment positions obtained by RNA-seq are random and usually not at the splicing intersection, extremely high sequencing depth is required to accurately detect different splicing isoforms, especially for many potential alternative splicing sites to be verified, so the cost of sequencing is extremely huge.
- the currently available commercial targeted RNA sequencing kits on the market, they are all kits designed and developed for known targets, and cannot meet the requirements for target sequencing of unknown splicing isoforms.
- the targeted RNA sequencing methods reported in existing literature are also difficult to apply to enterprises and non-scientific research purposes. The main reason is that it is necessary to rely on commercial companies such as Agilent Technologies and IDT to design and synthesize hybridization probes for targeted enrichment.
- Another major factor is that the existing method (Agilent) can only analyze known splice isoforms: the need for primer design, the possibility of competition between amplification primers of different splice isoforms, even in a single droplet with highly diluted template.
- the classic method for quantitative analysis of low-throughput splice isoforms is qRT-PCR, but the different amplification efficiencies between different splice isoforms will cause systematic deviations in the fluorescence signal, and only relative quantification can be achieved even through cumbersome calibration steps; and when detecting low-abundance splice isoforms, the requirements for primer design, experimental operation and equipment are high, and the quantitative results are difficult to repeat.
- the purpose of the present disclosure is to provide a targeted high-throughput sequencing method for detecting splicing isoforms.
- the present disclosure provides a method for establishing a sequencing library for high-throughput sequencing, comprising the following steps:
- step (1) connecting the oligonucleotide fragment with an alkyne modification at the 5′ end to the cDNA fragment obtained in step (1) by a click chemistry reaction, wherein the oligonucleotide fragment comprises a random sequence and a complementary segment sequence of universal sequencing primer 1 (seq1);
- step (3) using the reaction product of step (2) as a template, performing PCR amplification;
- the present disclosure provides a high-throughput sequencing library constructed according to the above method.
- the present disclosure provides a targeted high-throughput sequencing method for detecting splicing isoforms, comprising the following steps:
- the present disclosure provides a method for establishing a sequencing library for high-throughput sequencing, a use of the high-throughput sequencing library and/or the targeted high-throughput sequencing method in the evaluation of off-target events;
- the off-target event assessment includes the following:
- the throughput of traditional qRT-PCR methods is limited by the number of fluorescence in the detection instrument, bleed-through between fluorescence, and mutual interference between primers in the reaction system.
- the throughput of the commonly used qRT-PCR is quadruple, while the HTAS (High-throughput Targeted Alternative Splicing) analysis platform disclosed in the present invention can simultaneously analyze 5-100 (or more) pre-mRNA splicing events;
- HTAS unique advantage of HTAS is that the addition of a certain proportion of 3' modified dNTPs during reverse transcription can randomly terminate the extension of the first-chain cDNA, so the same type of splicing isoforms can generate cDNA products of different lengths, and the amplification efficiency is no longer affected by the different lengths of splicing isoform PCR products during library construction.
- N5 random sequences
- 5' linker can remove systematic bias (PCR skewing) in the PCR amplification process; mixed sample experiments have shown that the correlation between the actual detection value of HTAS and the theoretical value of mixed samples is r>0.99;
- Detection sensitivity The minimum intron-retained (IR) splicing isoform ratio detected in the embodiment is 0.1%. With the increase of sequencing depth, the detection sensitivity of IR is expected to reach 1/10 5 or lower, which is much higher than that of traditional RNA-seq, especially for low-abundance transcripts;
- Unknown splicing isoforms can be detected: Due to the diversity of tissues and cells, a large number of splicing isoforms in the current detection system have not been annotated in the transcriptome. Traditional qRT-PCR schemes are mainly designed for known splicing isoforms, while the HTAS platform can detect both known and unknown splicing isoforms, eliminating the interference of missing annotations in isoform analysis and improving the specificity of quantitative analysis;
- Trans-splicing is a new technology for the targeted editing of mRNA, but the off-target phenomenon is one of the main bottlenecks in its clinical application. Since off-target events are expected to occur in any pre-mRNA in the transcriptome, there is currently no method to systematically evaluate the off-target phenomenon.
- HTAS only needs to know the sequence of the trans-splicing molecule (Pre-mRNA Trans-splicing Molecule) to simultaneously perform qualitative and quantitative analysis of on-target trans-splicing and off-target trans-splicing;
- Figure 1 is a schematic diagram of the principle of splicing isomerization.
- Figure 2 is a schematic diagram of the high-throughput sequencing library construction (Scheme 1).
- Figure 3 is a schematic diagram of the high-throughput sequencing library construction (Scheme 2).
- FIG4 is a schematic diagram of the construction principle of a high-throughput sequencing library (Scheme 3).
- FIG5 is a schematic diagram showing the correlation between the actual detection values of splicing isoform IVT products with different mixing ratios and the theoretical values of mixed samples.
- the "plurality” mentioned in the present disclosure refers to two or more.
- “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
- a and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone.
- the character “/” generally indicates that the objects associated before and after are in an "or” relationship.
- the definitions and explanations of related terms are provided below.
- the term “about” or “approximately” means within plus or minus 10% of a given value or range. Where an integer is required, the term means within plus or minus 10% of a given value or range, rounded up or down to the nearest integer.
- the phrase "substantially identical" is understood to mean a sequence that exhibits at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or more sequence identity to a reference polypeptide sequence.
- nucleic acid sequences the term is understood to mean a nucleotide sequence that exhibits at least greater than 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or more sequence identity to a reference nucleic acid sequence.
- Oligo(dT) primer used in the present disclosure refers to a repetitive oligonucleotide sequence composed of about 12-25 polythymine Ts, which can specifically anneal to the ploy(A) tail of eukaryotic mRNA, and is therefore not suitable for RNA lacking a ploy(A) tail structure, such as prokaryotic RNA or miRNA, nor for degraded RNA, such as RNA in FFPE samples.
- Oligo(dT)n VN is composed of a fixed and specific sequence + (T)20 or so + degenerate base VN. The fixed and specific sequence exists to facilitate the design of universal PCR downstream primers.
- VN in the Oligo(dT)n VN sequence refers to the presence of an anchor base at the 3' end.
- V represents dATP, dGTP or dCTP;
- N represents any one of dATP, dTTP, dGTP, and dCTP).
- the function of the anchor base is to be able to specifically bind to the 5' end of Poly(A) to prevent excessive T bases from being reverse transcribed.
- click chemistry as used in the present disclosure is well known in the art and generally refers to a fast reaction that is easy to purify and region-specific. Click chemistry is a class of reactions that allow a selected substrate to be connected to a specific molecule. Click chemistry is not a single specific reaction, but describes a way of producing products according to examples in nature, which also produces substances by connecting small modular units. In many applications, click reactions connect biomolecules and reporter molecules. Click chemistry is not limited to biological conditions: the concept of "click" reactions has been used in pharmacology and various biomimetic applications. However, the application is particularly useful in the detection, localization and identification of biomolecules.
- a typical click reaction is the classic click reaction is the reaction of copper-catalyzed azides and alkynes to form a 5-membered heteroatom ring: Cu(I)-catalyzed azide-alkyne cycloaddition (CuAAC).
- contacting means contacting one reactant, reagent, solvent, catalyst, reactive group, etc. with another reactant, reagent, solvent, catalyst, reactive group, etc.
- the reactants, reagents, solvents, catalysts, reactive groups, etc. may be added individually, simultaneously, or separately and may be added in any order that achieves the desired result.
- the reactants, the reagents, the solvents, the catalysts, the reactive groups, etc. may be added with or without heating or cooling equipment and may optionally be added under an inert atmosphere.
- complementary used in this disclosure refers to the broad concept of sequence complementarity between regions of two polynucleotide chains or between two nucleotides by base pairing. It is known that adenine nucleotides can form specific hydrogen bonds ("base pairing") with thymine or uracil nucleotides. Similarly, it is known that cytosine nucleotides can base pair with guanine nucleotides.
- the term "library" when used with respect to nucleic acids is intended to mean a collection of nucleic acids having different chemical compositions (e.g., different sequences, different lengths, etc.).
- the nucleic acids in a library will be different species having common features or properties of a genus or class, but differing to some extent in other respects.
- a library may contain nucleic acid species that differ in nucleotide sequence but are similar in having a sugar-phosphate backbone. Libraries may be created using techniques known in the art.
- the examples exemplified herein are The nucleic acid may include nucleic acids obtained from any source, including, for example, a genome (e.g., a human genome) or a mixture of genomes digested. In another example, the nucleic acid may be those obtained from a metagenomic study of a particular environment or ecosystem. The term also includes artificially created nucleic acid libraries, such as DNA libraries.
- random primer or “random hexamer primer” or “Random hexamer” or “Random hexamer primer” as used in this disclosure are well known in the art and generally refer to short oligodeoxyribonucleotides (d(N)6) of random sequence that anneal to random complementary sites on the target DNA or RNA and are used as primers for DNA synthesis by DNA polymerase or reverse transcriptase.
- d(N)6 short oligodeoxyribonucleotides
- AzNTP (3'-azido-2',3'dNTP) is an azide deoxynucleotide, wherein the base is selected from adenine, guanine, cytosine and thymine.
- PCR add on PCR
- primers involved in PCR have some other sequences in addition to the sequences complementary to the template. These other sequences do not participate in this round of PCR reaction, but the generated PCR products can serve as templates for the next PCR reaction because of the additional sequences.
- the present disclosure provides a method for establishing a sequencing library for high-throughput sequencing, comprising the following steps:
- step (1) connecting the oligonucleotide fragment with an alkyne modification at the 5′ end to the cDNA fragment obtained in step (1) by a click chemistry reaction, wherein the oligonucleotide fragment comprises a random sequence and a complementary segment sequence of universal sequencing primer 1 (seq1);
- step (3) using the reaction product of step (2) as a template, performing PCR amplification;
- the click chemistry reaction refers to a CuAAC click reaction, i.e., a copper ion-catalyzed azide-alkyne cycloaddition reaction.
- the reverse transcription PCR reaction in step (1) includes five reaction stages: the specific reaction conditions are 25°C for 10 min, 37°C for 10 min, 50°C for 45 min, 85°C for 2 min, and 12°C for maintenance.
- the enzymes used in the reverse transcription PCR reaction in step (1) include HiScript III Reverse Transcriptase (R302-01, Nanjing Novogene Biotechnology Co., Ltd.), SuperScript TM III Reverse Transcriptase (18080093, ThermoFisher SCIENTIFIC), HiFi II M-MLV (H-) Reverse Transcriptase (CW0743, Kangwei Century Biotechnology Co., Ltd.), Reverse Transcriptase [M-MLV, RNaseH-] (AE101-02, Beijing Quanshijin Biotechnology Co., Ltd.), MutiScript II Reverse Transcriptase (MD311, Feipeng Biotechnology Co., Ltd.).
- the reverse transcription primer described in step (1) is a random primer or a gene-specific primer group 1 designed based on the downstream exon of the retained intron of the targeted gene, and the 5' end of the primer in the gene-specific primer group 1 carries a universal sequencing primer 2 sequence (seq2).
- the click chemistry reaction system in step (2) further includes vitamin C, a copper (II)-TBTA composition, and DMSO.
- the PCR amplification reaction in step (3) includes four reaction stages: the first PCR amplification reaction is 1 cycle, and the specific reaction conditions are 94°C for 1 min, 60°C for 30 s, and 68°C for 10 min; the second PCR amplification reaction includes 12 cycles, and the specific reaction conditions are 94°C for 30 s, 60°C for 30 s, and 68°C for 2 min; the third PCR amplification reaction includes 1 cycle, and the specific reaction conditions are 68°C for 5 min; the fourth PCR amplification reaction includes 1 cycle, and the specific reaction conditions are 12°C.
- the PCR reaction system in step (3) further includes PCR Buffer solutions such as MgCl 2 , DMSO, Tris-HCl, EDTA, NaCl, and KCl.
- the enzyme used in the PCR reaction in step (3) is Taq DNA polymerase.
- the PCR amplification primers used in step (3) include universal sequencing primer 1 and gene-specific primer group 2, wherein the 5' end of the primer in the gene-specific primer group 2 carries the universal sequencing primer 2 sequence.
- both gene-specific primer groups 1 and 2 are designed based on exons downstream of alternative splicing events of retained introns of specific targeted genes, wherein the targeting site of gene-specific primer 2 is shifted 5-100 bases upstream of the site of gene-specific primer 1.
- the targeting site of gene-specific primer 2 is shifted upstream by 5, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 bases than the site of gene-specific primer 1.
- the targeting site of gene-specific primer group 2 is shifted 20-50 bases upstream of the position of gene-specific primer group 1.
- the number of target genes targeted by the gene-specific primer group 1 and the gene-specific primer group 2 respectively is greater than or equal to 1.
- both the gene-specific primer groups 1 and 2 are designed based on the exons downstream of the retained introns of the specific targeted gene.
- the targeting positions of the gene-specific primer groups 1 and 2 may partially overlap but not completely coincide.
- the molar concentration ratio of common dNTPs and 3'-modified dNTPs added in step (1) is 1:1-1:100, and the molar concentration ratio can be 1:100, 1:90, 1:80, 1:75, 1:70, 1:65, 1:60, 1:55, 1:50, 1:45, 1:40, 1:35, 1:30, 1:25, 1:20, 1:19, 1:18, 1:17, 1:16, 1:15, 1:14, 1:13, 1:12, 1:11 or 1:10; in a preferred embodiment, the molar concentration ratio of common dNTPs and 3'-modified dNTPs added in step (1) is 1:50.
- the molar concentration ratio of common dNTP and 3'-modified dNTP added in step (1) is 1:20.
- the random sequence in step (2) comprises 4-16 nucleotides.
- the random sequence in step (2) comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or 16 nucleotides.
- the random sequence in step (2) comprises 5 nucleotides.
- the PCR amplification of step (3) includes the step of performing PCR amplification on the reaction product of step (2) using universal sequencing primer 1 and gene-specific primer group 2, and then adding a connector structure to the amplified product, wherein the connector structure includes a P5/P7 connector and a nucleic acid barcode, and the nucleic acid barcode is connected to the primer P5 and/or P7 connector end.
- step (3) is: using the reaction product of step (2) as a template, adding P5 and P7 PCR reaction was performed on the first.
- the P5 linker primer is shown as SEQ ID NO: 54.
- the P7 linker primer is as shown in SEQ ID NO: 55-57.
- the nucleic acid barcode is attached to a single end of the primer P7 adapter.
- the nucleic acid barcode is attached to both ends of primer P5 and P7 adapter.
- the nucleic acid barcode is divided into nucleic acid barcode 5 and nucleic acid barcode 7.
- nucleotide sequence of the nucleic acid barcode 5 is selected from the sequence shown in any one of SEQ ID NO.10-35.
- nucleotide sequence of the nucleic acid barcode 7 is selected from the sequence shown in any one of SEQ ID NO.36-53.
- the PCR amplification reaction in step (3) includes four reaction stages: the first PCR amplification reaction is 1 cycle, and the specific reaction conditions are 94°C for 30 seconds; the second PCR amplification reaction includes 18 cycles, and the specific reaction conditions are 94°C for 30 seconds, 68°C for 30 seconds, and 72°C for 30 seconds; the third PCR amplification reaction includes 1 cycle, and the specific reaction conditions are 72°C for 5 minutes; the fourth PCR amplification reaction includes 1 cycle, and the specific reaction conditions are 12°C.
- the PCR reaction system in step (3) further includes PCR Buffer solutions such as MgCl 2 , DMSO, Tris-HCl, EDTA, NaCl, and KCl.
- the enzyme used in the PCR reaction in step (3) is Taq DNA polymerase.
- the universal sequencing primer 1 sequence is selected from any one of SEQ ID NO: 3, SEQ ID NO: 5 or SEQ ID NO: 6.
- the complementary segment sequence of the universal sequencing primer 1 is shown as SEQ ID NO:4.
- the universal sequencing primer 2 sequence is selected from any one of SEQ ID NO:7 or SEQ ID NO:8.
- the present disclosure provides a high-throughput sequencing library constructed according to the above method.
- the present disclosure provides a targeted high-throughput sequencing method for detecting splicing isoforms, comprising the following steps:
- the sample is a tissue, cell, and/or body fluid sample.
- the body fluid sample includes one or more of blood, saliva, urine, breast milk, cerebrospinal fluid, amniotic fluid, ascites, bile, and pleural effusion.
- the sample is a cell sample.
- the cell samples include, but are not limited to, MCF10A, MCF7, HeLa, HEK293T, and/or MDA-MB-231.
- the target genes include but are not limited to ATP13A1, CXXC1, ECHDC2, FGFRL1, HMGN3, KLHL17, NAXD, LZTR1, SELENBP1, JMJD8, PSMB1, HIGD2A, HNRNPAB, SMARCC1, ATP5IF1, HIGD2B, RPS21, UQCC5, NFATC3, PCNP and/or OSGEP.
- the present disclosure provides a method for establishing a sequencing library for high-throughput sequencing, the use of the high-throughput sequencing library and/or the targeted high-throughput sequencing method in the evaluation of off-target events.
- the off-target event assessment includes the following:
- the high-throughput sequencing method can be used to detect any form of alternative splicing and any combination of transcripts in the transcriptome.
- the construction process of the sequencing library for high-throughput sequencing is as follows, and its principle diagram is shown in FIG2 :
- Reverse transcription The first strand of cDNA is synthesized by reverse transcription using RNA as a template, using a random hexamer primer as a reverse transcription primer, and adding ordinary dNTP and a certain concentration of 3' modified dNTP for reverse transcription reaction, wherein the 3' modified dNTP is AzNTP (3'-azido-2', 3'dNTP), which can prevent the addition of the next dNTP from binding during chain extension, and the 3' modified group should be used for chemical connection with the downstream linker.
- Random Hexamer is a single-stranded DNA with a random sequence of 6 bases, and the characteristics of the random sequence can help it randomly bind to different fragments of RNA.
- dNTPs are added to the 3' end of the primer using RNA as a template to synthesize single-stranded cDNA.
- the 3' modified dNTP replaces dNTPs and is added to the cDNA single strand, the chain extension is terminated and the synthesis of the first strand of cDNA is completed.
- the cDNA obtained above is subjected to a click chemistry reaction (i.e., CuAAC click reaction—monovalent copper ion-catalyzed azide-alkyne cycloaddition reaction) with an oligonucleotide modified with an alkyne group at the 5' end.
- the oligonucleotide adapter sequence contains a complementary segment sequence of 5'N5+NGS universal sequencing primer (seq1), which serves as a PCR primer binding site in the subsequent sequencing library construction.
- N5 is a sequence composed of 5 random bases, which is used to ensure the molecular diversity of the sequencing library and to remove systematic biases and sequencing errors in the PCR amplification process in the result analysis.
- the length of the random base sequence can be adjusted to N4, N5, N6, N7, N8, N9, N10, N11, N12, N13, N14, N15, and N16. Where N is selected from any base in ATCG.
- the target gene fragment is targeted and enriched through PCR reaction.
- the primers in the PCR system include a universal sequencing primer 1 (seq1, whose 3' end sequence is complementary to the 3' end of the above-mentioned alkyne primer) and a gene-specific primer group.
- the primer group is designed based on the exons downstream of the retained introns of each targeted gene, and a universal sequencing primer 2 sequence (seq2) is added to the 5' end of the primer.
- the CLICK product contains an azide-alkyne linker, so the DNA polymerase needs to use Taq or other polymerases that can amplify this template.
- Sequencing adapter connection and enrichment Using the above enriched fragments as templates, PCR reaction is performed to add complete P5 and P7 adapters to the enriched fragments for subsequent NGS sequencing. A nucleic acid barcode (sample barcode) is added. Nucleic acid barcodes can achieve mixed samples during high-throughput sequencing, improve detection throughput, and reduce instrument errors during sample sequencing.
- the reverse transcription primer in the high-throughput sequencing library construction process is a gene-specific primer group 1 designed based on the downstream exon of the retained intron of the targeted gene, and a universal sequencing primer sequence (seq2) is added to the 5' end of the specific primer; in some embodiments, in the actual construction process, the sequencing adapter connection and enrichment can be directly performed after the click chemistry connection product is purified; in some embodiments, the construction process of the sequencing library for high-throughput sequencing is as follows, and its schematic diagram is shown in Figure 3:
- RNA is used as a template to synthesize the first strand of cDNA by reverse transcription, and a gene-specific primer group 1 designed based on the downstream exons of the retained introns of the targeted gene is used as a reverse transcription primer, and a universal sequencing primer sequence (seq2) is added to the 5' end, and ordinary dNTP and a certain concentration of 3' modified dNTP are added to perform a reverse transcription reaction, wherein the 3' modified dNTP is AzNTP (3'-azido-2', 3'dNTP), which can prevent the addition of the next dNTP from binding during chain extension, and at the same time
- the 3' modification group should be available for chemical connection to the downstream linker.
- dNTPs are added to the 3' end of the primer using RNA as a template to synthesize single-stranded cDNA.
- the 3' modified dNTP replaces dNTPs and is added to the cDNA single strand, the chain extension is terminated and the synthesis of the first strand of cDNA is completed.
- the cDNA obtained above is subjected to a click chemistry reaction (i.e., CuAAC click reaction—monovalent copper ion-catalyzed azide-alkyne cycloaddition reaction) with an oligonucleotide modified with an alkyne group at the 5’ end.
- the oligonucleotide adapter sequence contains a complementary segment sequence of 5’N5+NGS universal sequencing primer (seq1), which serves as a PCR primer binding site in the subsequent sequencing library construction.
- N5 is a sequence composed of 5 random bases, which is used to ensure the molecular diversity of the sequencing library and to remove systematic deviations and sequencing errors in the PCR amplification process in the result analysis.
- the length of the random base sequence can be adjusted to N4, N5, N6, N7, N8, N9, N10, N11, N12, N13, N14, N15, and N16.
- Sequencing adapter connection and enrichment Using the click chemistry reaction product as a template, add complete P5 and P7 adapters for subsequent PCR amplification.
- the P5 adapter can complement the Seq 1 complementary sequence in the click chemistry reaction product, and the P7 adapter can complement the Seq 2 complementary sequence generated during the PCR reaction.
- a nucleic acid barcode (sample barcode). Nucleic acid barcodes can achieve mixed samples during high-throughput sequencing, improve detection throughput, and reduce instrument errors during sample sequencing.
- the P5 and P7 adapters are connected and the fragments to be sequenced are enriched for subsequent NGS sequencing.
- the CLICK product contains an azide-alkyne linker, so the DNA polymerase needs to use Taq or other polymerases that can amplify this template.
- the reverse transcription primer used in the high-throughput sequencing library construction process is a gene-specific primer group 1 designed based on the downstream exons of the retained introns of the targeted gene; in some embodiments, a specific gene PCR is performed after the click chemistry ligation reaction to remove the influence of ribosomal RNA in the template on the sequencing data; in some embodiments, the construction process of the sequencing library for high-throughput sequencing is as follows, and its principle diagram is shown in Figure 4:
- Reverse transcription The first strand of cDNA is synthesized by reverse transcription using RNA as a template, using a gene-specific primer group 1 designed based on the downstream exons of the retained introns of the targeted gene as a reverse transcription primer, and adding a universal sequencing primer sequence (seq2) at the 5' end, adding ordinary dNTP and a certain concentration of 3' modified dNTP for reverse transcription reaction, wherein the 3' modified dNTP is AzNTP (3'-azido-2', 3'dNTP), which can prevent the addition of the next dNTP from binding during chain extension, and the 3' modified group should be used for chemical connection of the downstream linker.
- dNTPs are added to the 3' end of the primer using RNA as a template to synthesize single-stranded cDNA.
- the 3' modified dNTP replaces the dNTPs and is added to the cDNA single strand, the chain extension is terminated, and the synthesis of the first strand of cDNA is completed.
- the cDNA obtained above is subjected to a click chemistry reaction (i.e., CuAAC click reaction—monovalent copper ion-catalyzed azide-alkyne cycloaddition reaction) with an oligonucleotide modified with an alkyne group at the 5’ end.
- the oligonucleotide adapter sequence contains a complementary segment sequence of 5’N5+NGS universal sequencing primer (seq1), which serves as a PCR primer binding site in the subsequent sequencing library construction.
- N5 is a sequence composed of 5 random bases, which is used to ensure the molecular diversity of the sequencing library and to remove systematic deviations and sequencing errors in the PCR amplification process in the result analysis.
- the length of the random base sequence can be adjusted to N4, N5, N6, N7, N8, N9, N10, N11, N12, N13, N14, N15, and N16.
- Targeted enrichment Using the click chemistry reaction product as a template, the target gene fragment is targeted and enriched through PCR reaction.
- the primers in the PCR system include a universal sequencing primer 1 (seq1, whose 3' end sequence is complementary to the 3' end of the above-mentioned alkyne primer) and a gene-specific primer group 2.
- Primer group 2 is designed based on the exons downstream of the retained introns of each targeted gene, but should be smaller than the target Move 5-100 bases upstream of the reverse transcription primer group 1 position, and even partially overlap, but not completely match, and add the universal sequencing primer 2 sequence (seq2) to the 5' end of the primer.
- the CLICK product contains an azide-alkyne linker, so the DNA polymerase needs to use Taq or other polymerases that can amplify this template.
- Sequencing adapter connection and enrichment Using the above enriched fragments as templates, PCR reaction is performed to add complete P5 and P7 adapters to the enriched fragments for subsequent NGS sequencing. A nucleic acid barcode (sample barcode) is added. Nucleic acid barcodes can achieve mixed samples during high-throughput sequencing, improve detection throughput, and reduce instrument errors during sample sequencing.
- the fragments in the constructed high-throughput sequencing library all include the following elements: P5 sequence, sample tag 5, universal sequencing primer 1 (seq1), inserted DNA fragment, universal sequencing primer 2 (seq2), sample tag 7, and P7 adapter.
- nucleotide sequences of the P5 sequence and P7 are shown as SEQ ID NO:1 (AATGATACGGCGACCACCGAGATCTACAC) and SEQ ID NO:2 (CAAGCAGAAGACGGCATACGAGAT), respectively.
- the universal sequencing primer includes two types, PE adapter and Nextera adapter.
- the universal sequencing primers 1 and 2 are selected from any combination of the sequences in Table 1 below.
- nucleotide sequence of the oligonucleotide Hex_N5_Seq1rc is as shown in SEQ ID NO:9 (NNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT).
- sample tag 5 and the sample tag 7 are selected from any combination of the sequences in Table 2 below.
- sequence information of the P5 and P7 linkers of the complete sequence is shown in Table 3.
- Example 1 Construction of a targeted high-throughput sequencing platform for detecting splicing isoforms
- RNA extraction Kit Use a commercial kit to extract RNA from target cells, such as the FineProtect Universal RNA Extraction Kit from Jifan Biotechnology (Beijing) Co., Ltd.
- the quality of RNA extraction must be guaranteed during the extraction process for subsequent experiments.
- RNA is quantified using a commercial kit, and the RNA concentration is specifically detected using fluorescent dye technology, such as Invitrogen's Qubit TM RNA High Sensitivity (HS) Quantification Kit.
- fluorescent dye technology such as Invitrogen's Qubit TM RNA High Sensitivity (HS) Quantification Kit.
- RNA of the same mass as the template for synthesizing the first chain of cDNA use random hexamer as the reverse transcription primer, and add ordinary dNTP and 3' modified dNTP at a molar concentration ratio of 1/20 to the reverse transcription system for reverse transcription reaction.
- the product is purified using a commercial kit. For example, Zymo
- the purified product was eluted with pure water to 10 ⁇ L.
- the purified cDNA and oligonucleotides containing alkyne modification at the 5' end were subjected to a click reaction (i.e., CuAAC click reaction - monovalent copper ion-catalyzed azide-alkyne cycloaddition reaction) in the presence of the catalyst Copper(II)-TBTA complex at room temperature for 1 hour.
- a click reaction i.e., CuAAC click reaction - monovalent copper ion-catalyzed azide-alkyne cycloaddition reaction
- the components of the click chemistry reaction system are shown in Table 6, and the nucleic acid sequence of oligonucleotides is shown in SEQ ID NO:9.
- the click chemistry reaction product is purified using a commercial kit, such as the DNA Clean&Concentrator-5 kit from Zymo.
- the purified product is eluted with pure water to 10 ⁇ L.
- the purified product was used as a template, and the primers designed for the downstream exons of each retained intron of the targeted gene were used as the targeted gene-specific primer combination and a universal sequencing primer (PE1_p26) for PCR reaction.
- PE1_p26 universal sequencing primer
- the components of the PCR reaction system are shown in Table 7, the preparation of the primer combination mixture is shown in Table 8, the nucleic acid sequence of the PE1_p26 primer is shown in SEQ ID NO: 6, and the qPCR reaction conditions are shown in Table 9.
- the targeted enrichment PCR product was purified using a commercial kit, such as the DNA Clean & Concentrator-5 kit from Zymo.
- the purified product was eluted with pure water to 10 ⁇ L.
- the above-obtained targeted enriched fragments are used as templates and the complete P5 and P7 adapters are used as primers to perform PCR reactions for subsequent NGS sequencing.
- the P5 adapter can complement the sequence containing the Seq 1 complementary segment sequence in the click reaction product, and the P7 adapter can complement the sequence containing the Seq 2 complementary sequence generated during the PCR reaction.
- a nucleic acid barcode (sample barcode) is added to a single Add-on PCR primer or two Add-on PCR primers to achieve mixed samples during high-throughput sequencing.
- the components of the PCR reaction system are shown in Table 10, the nucleic acid sequences of the complete sequence of the P5 and P7 adapter primers are shown in Table 3, and the qPCR reaction conditions are shown in Table 11.
- high-throughput sequencing is performed to obtain sequencing information of the target gene in the cell sample, including but not limited to analysis of multiple targeted splicing isoforms, precise quantification of splicing isoforms, evaluation of unknown splicing isoforms and trans-splicing off-target events, etc.
- Example 2 Detection of splicing isoforms in MCF10A, MCF7, and MDA-MB-231 cell lines
- Example 1 The high-throughput sequencing platform described in Example 1 was used to detect splicing isoforms in MCF10A, MCF7, and MDA-MB-231 cell lines.
- steps 1-6 are the same as those in Example 1.
- primers designed for the downstream exons of the retained introns of the ATP13A1 gene, CXXC1 gene, ECHDC2 gene, FGFRL1 gene, HMGN3 gene, KLHL17 gene, and OSGEP gene were used as targeted enrichment primer combinations and universal sequencing primer PE1_p26 for PCR reaction.
- the components of the primer combination mixture are shown in Table 12, and the primer sequences of each targeted gene are shown in Table 13.
- Steps 8-10 are the same as those in Example 1.
- the high-throughput sequencing results showed that the target genes of MCF10A, MCF7, and MDA-MB-231 cell lines all had normal spliceosomes and splice isoforms (intron retention), and the ratio of splice isoforms in the targeted genes was successfully quantitatively detected.
- the specific test results are shown in Table 14. The above data show that the targeted high-throughput sequencing platform constructed in the present disclosure can effectively realize the detection of splice isoforms.
- Example 3 In-vitro Transcription (IVT) RNA simulation detection of different ratios of splicing isoforms
- IVT product an in vitro RNA synthesis sample (IVT product), including two splice isoform sequences, namely, a normal splice isoform and a splice isoform (intron retention). Mix the splice isoforms of the IVT product in different proportions, and use the targeted high
- the throughput sequencing platform detects and calculates the correlation between the actual detection value of the mixed sample and the theoretical value of the mixed sample.
- Step 1 Obtain target RNA synthesis sample by in vitro transcription
- the synthesized normal spliceosome and splice isoform (intron retention) DNA is subjected to in vitro transcription reaction using commercial kits, such as Novazonic T7 High Yield RNA Transcription Kit (TR101).
- TR101 Novazonic T7 High Yield RNA Transcription Kit
- the in vitro transcription reaction system is shown in Table 15.
- Step 2 RNA recovery and purification
- RNA synthesized by in vitro transcription was recovered by phenol-ethanol precipitation.
- the final precipitated RNA product was dissolved in double distilled water, the RNA concentration was determined by Nano-Drop, and the RNA was diluted to 2 ⁇ M, 0.2 ⁇ M, 0.02 ⁇ M, and 0.002 ⁇ M for later use.
- Step 3 Mixing different ratios of splice isoform IVT products
- the IVT products were mixed in a 1:1 volume ratio as shown in Table 16.
- the mixed IVT products were diluted 100 times and used as templates for synthesizing the first chain of cDNA.
- the reverse primer on the downstream exon of the target intron was used as the reverse transcription primer.
- Ordinary dNTP and 3' modified dNTP at a molar concentration of 1/15 were added to the reverse transcription system for reverse transcription reaction.
- steps 5-6 are performed the same as steps 4-6 in Example 1.
- the purified product was used as a template, and the primers designed for the downstream exons of the target intron were used as a combination of targeted gene-specific primers and a universal sequencing primer (PE1_p26) for PCR reaction.
- PE1_p26 universal sequencing primer
- the components of the PCR reaction system are shown in Table 19, the nucleic acid sequence of the PE1_p26 primer is shown in SEQ ID NO: 6, and the qPCR reaction conditions are shown in Table 20.
- the primer sequences are shown in Table 21.
- steps 9-10 are performed the same as steps 8-9 in Example 1.
- Example 1 The high-throughput sequencing platform described in Example 1 was used to detect splicing isoforms in the HeLa cell line.
- steps 1-2 are the same as those in Example 1.
- step 3 gene-specific primers are used as reverse transcription primers, and common dNTPs and 3'-modified dNTPs at a molar concentration ratio of 1/15 are added to the reverse transcription system for reverse transcription reaction.
- the gene-specific primer sequences are shown in Table 23, the preparation of the gene-specific primer combination mixture is shown in Table 24, the components of the reverse transcription reaction system are shown in Table 25, and the reverse transcription reaction conditions are shown in Table 26.
- steps 4-6 are the same as those in Example 1.
- the primers designed for the downstream exons of the JMJD8 and JMJD9 genes were used as the targeted enrichment primer combination and the universal sequencing primer PE1_p26 for PCR reaction. Compared with the gene-specific primers in reverse transcription, the primers for targeted enrichment PCR are closer to the intron by about 20-50bp.
- the primer sequences of each targeted gene are shown in Table 27, the preparation of the primer mixture is shown in Table 28, the components of the PCR reaction system are shown in Table 29, the nucleic acid sequence of the PE1_p26 primer is shown in SEQ ID NO: 6, and the PCR reaction conditions are shown in Table 30.
- Steps 8-10 are the same as those in Example 1.
- the final product was subjected to high-throughput sequencing, and the results of high-throughput sequencing showed that normal spliceosomes and splice isoforms (intron retention) were visible in the target genes of the HeLa cell line, and the ratio of splice isoforms in the targeted genes was successfully quantitatively detected.
- the specific test results are shown in Table 31. The above data show that the targeted high-throughput sequencing platform constructed in the present disclosure can effectively realize the detection of splice isoforms.
- Example 1 The high-throughput sequencing platform described in Example 1 was used to detect on-target trans-splicing and off-target trans-splicing events that occurred after HEK293T cells were transfected with mini-genes and trans-splicing factors (Pre-mRNA Trans-splicing Molecule).
- steps 1-2 are the same as those in Example 1.
- the reverse splicing molecule and the mini-gene specific primer are used as reverse transcription primers, and ordinary dNTP and 3' modified dNTP at a molar concentration ratio of 1/15 are added to the reverse transcription system for reverse transcription reaction.
- the gene-specific primer sequences are shown in Table 32, the preparation of the gene-specific primer combination mixture is shown in Table 33, the components of the reverse transcription reaction system are shown in Table 34, and the reverse transcription reaction conditions are shown in Table 35.
- steps 4-6 are the same as those in Example 1.
- the trans-splicing molecule and mini-gene specific primers are used as the targeted enrichment primer combination and the universal sequencing primer PE1_p26 for PCR reaction.
- the primers of trans-splicing molecules and mini-gene targeted enrichment PCR are closer to the 3'splicing site (3'splicing site) and 5'splicing site (5'splicing site) by about 20-50bp.
- the primer sequences of each targeted gene are shown in Table 36, the preparation of the primer mixture is shown in Table 37, the components of the PCR reaction system are shown in Table 38, the nucleic acid sequence of the PE1_p26 primer is shown in SEQ ID NO:6, and the PCR reaction conditions are shown in Table 39.
- Steps 8-10 are the same as those in Example 1.
- the final product was subjected to high-throughput sequencing, and the results of high-throughput sequencing showed that after HEK293T cells were transfected with mini-genes (mini-Gene) and trans-splicing factors (Pre-mRNA Trans-splicing Molecule), target trans-splicing products and off-target trans-splicing products were visible.
- mini-genes mini-Gene
- trans-splicing factors Pre-mRNA Trans-splicing Molecule
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
A targeted high-throughput sequencing method for detecting a splicing isoform, which method comprises a method for establishing a sequencing library for high-throughput sequencing. The method for establishing the sequencing library comprises the following steps: 1) performing reverse transcription on a sample RNA using a reverse transcription primer, adding common dNTP and 3'-modified dNTP, and reacting to obtain a first chain of cDNA; 2) connecting an oligonucleotide fragment with alkynyl modification at the 5' end with the cDNA fragment obtained in the step 1) by means of a click chemical reaction; 3) performing PCR amplification using the reaction product in the step 2) as a template; and 4) obtaining a sequencing library. A target enrichment primer can be introduced in the step of reverse transcription or PCR amplification. The reaction system of the enrichment step comprises multiple gene-specific primers, and each primer is designed on the basis of a downstream exon segment of an alternative splicing event in the transcript, to ensure that random-length fragments generated by reverse transcription cover splicing sites.
Description
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求于2023年4月27日提交的PCT国际申请PCT/CN2023/091367号的优先权,本申请引用上述PCT国际申请的全文。This application claims priority to PCT international application No. PCT/CN2023/091367 filed on April 27, 2023, and this application cites the full text of the above-mentioned PCT international application.
本发明涉及生物技术领域,具体地,本发明涉及一种用于检测剪接异构体的靶向高通量测序方法。The present invention relates to the field of biotechnology, and in particular to a targeted high-throughput sequencing method for detecting splicing isoforms.
可变剪接(Alternative Splicing,AS)是指从一个mRNA前体中通过不同的剪接方式(选择不同的剪接位点组合)产生不同的mRNA剪接异构体的过程。可变剪接的发生过程中存在一定程度上的灵活性,最终mRNA的结构可选择性地包含前体RNA的部分外显子和/或内含子片段,从而使蛋白质在序列组成上产生变化。因此,可变剪接是蛋白质组多样性的重要机制。Alternative splicing (AS) refers to the process of generating different mRNA splicing isoforms from a single mRNA precursor through different splicing modes (selecting different splicing site combinations). There is a certain degree of flexibility in the process of alternative splicing, and the final mRNA structure can selectively contain some exon and/or intron fragments of the precursor RNA, thereby causing changes in the sequence composition of the protein. Therefore, alternative splicing is an important mechanism for proteome diversity.
在基因表达过程中,基因转录首先合成pre-mRNA,pre-mRNA包含所有内含子和外显子以及两端非编码区域(5’and 3’Untranslated region)。RNA剪接是pre-mRNA成熟为mRNA的一个必不可少的步骤。在剪接体的作用下切除pre-mRNA中内含子(intron),并连接所有外显子和UTR序列。当发生可变剪接时,同一种pre-mRNA可生成两种或以上的剪接异构体。可变剪接可归纳为五种形式,其中最为常见的两种是内含子保留(intron retention)和外显子跳越(exon skipping)。如下图1所示,内含子保留是在pre-mRNA剪接过程中,内含子没有被切除,在mRNA上仍然保留该内含子;外显子跳越则是在pre-mRNA剪接过程中,外显子被剪接体剪切去除,导致后续生成的mRNA有外显子缺失。In the process of gene expression, gene transcription first synthesizes pre-mRNA, which contains all introns and exons as well as non-coding regions at both ends (5’ and 3’ Untranslated regions). RNA splicing is an essential step for pre-mRNA to mature into mRNA. Under the action of the spliceosome, introns in pre-mRNA are removed and all exons and UTR sequences are connected. When alternative splicing occurs, the same pre-mRNA can generate two or more splice isomers. Alternative splicing can be summarized into five forms, of which the two most common ones are intron retention and exon skipping. As shown in Figure 1 below, intron retention means that introns are not removed during pre-mRNA splicing and are still retained on mRNA; exon skipping means that exons are removed by spliceosomes during pre-mRNA splicing, resulting in exon deletion in the subsequently generated mRNA.
目前市面上可提供的商业化RNA建库试剂盒主要以不同片段化方法来构建总mRNA测序文库,无针对剪接异构体的靶向,尤其是未知剪接异构体的靶向富集建库方法。主要原因在于,经典RNA-seq是将总RNA(total RNA)或信使RNA(messenger RNA,mRNA)随机碎片化后,通过随机引物random hexamer和/或oligo(dT)引物逆转录形成双链DNA并完成后续接头连接建库。上述方法只针对全转录组,不能目标性针对某一感兴趣的小部分转录组进行分析,同时成本高昂。由于RNA-seq获得片段位置随机,通常不在剪接交叉处,需要极高的测序深度才能满足准确检测不同剪接异构体(splicing isoforms),尤其对于众多待验证的潜在选择性剪接位点时,因此对于测序的成本是极其巨大的。对于目前市面上可提供的靶向RNA测序商业试剂盒均为已知靶点设计开发的试剂盒,无法满足对未知剪接异构体的靶点测序要求。同时,现有文献报道的靶向RNA测序方法,也难以应用与企业和非科研用途。主要原因在于,需要依托商业公司如Agilent Technologies、IDT等设计、合成用于靶向富集的杂交探针,这一过程通常耗时较长需不断优化,且无法灵活改变(增加、删减)已成型的探针。对于实时发现的新潜在靶点不能及时、高效地完成验证,同时费用成本巨大。即便对于优化成熟的靶向富集体系,建库的运转周期也通常至少需要2天,且动手步骤操作复杂。另外,做剪接异构体验证
时的一个重点是应尽可能包含所有的异构体(isoform),但目前的所有方法对此都可能无法完善。另外一个主要因素是现有方法(Agilent)只能对已知剪接异构体进行分析:引物设计的需求,不同剪接异构体间扩增引物存在相互竞争的可能,即便在模板高度稀释的单微滴情况下。低通量剪接异构体的定量分析的经典方法是qRT-PCR,但不同剪接异构体间不同扩增效率会引起荧光信号的系统性偏差,即便通过繁琐的校准步骤也只能实现相对定量;并且在检测低丰度的剪接异构体时对引物设计、实验操作和设备要求较高,定量结果难以重复。The commercial RNA library construction kits currently available on the market mainly use different fragmentation methods to construct total mRNA sequencing libraries, and there is no targeted enrichment library construction method for splicing isoforms, especially for unknown splicing isoforms. The main reason is that the classic RNA-seq is to randomly fragment the total RNA (total RNA) or messenger RNA (messenger RNA, mRNA), and then reverse transcribe it through random primers random hexamer and/or oligo (dT) primers to form double-stranded DNA and complete the subsequent adapter connection library construction. The above method only targets the whole transcriptome, and cannot target a small part of the transcriptome of interest for analysis, and the cost is high. Since the fragment positions obtained by RNA-seq are random and usually not at the splicing intersection, extremely high sequencing depth is required to accurately detect different splicing isoforms, especially for many potential alternative splicing sites to be verified, so the cost of sequencing is extremely huge. For the currently available commercial targeted RNA sequencing kits on the market, they are all kits designed and developed for known targets, and cannot meet the requirements for target sequencing of unknown splicing isoforms. At the same time, the targeted RNA sequencing methods reported in existing literature are also difficult to apply to enterprises and non-scientific research purposes. The main reason is that it is necessary to rely on commercial companies such as Agilent Technologies and IDT to design and synthesize hybridization probes for targeted enrichment. This process is usually time-consuming and requires continuous optimization, and it is impossible to flexibly change (add, delete) the existing probes. New potential targets discovered in real time cannot be verified in a timely and efficient manner, and the cost is huge. Even for optimized and mature targeted enrichment systems, the operation cycle of library construction usually takes at least 2 days, and the hands-on steps are complicated. In addition, splicing isoform verification is required. One of the key points is to include all isoforms as much as possible, but all current methods may not be perfect for this. Another major factor is that the existing method (Agilent) can only analyze known splice isoforms: the need for primer design, the possibility of competition between amplification primers of different splice isoforms, even in a single droplet with highly diluted template. The classic method for quantitative analysis of low-throughput splice isoforms is qRT-PCR, but the different amplification efficiencies between different splice isoforms will cause systematic deviations in the fluorescence signal, and only relative quantification can be achieved even through cumbersome calibration steps; and when detecting low-abundance splice isoforms, the requirements for primer design, experimental operation and equipment are high, and the quantitative results are difficult to repeat.
因此开发一款更为高效、精确的RNA多重靶向富集建库方法尤为重要与迫切。Therefore, it is particularly important and urgent to develop a more efficient and accurate RNA multiple targeted enrichment library construction method.
发明内容Summary of the invention
为了解决现有技术中存在的问题,本公开的目的在于提供一种用于检测剪接异构体的靶向高通量测序方法。In order to solve the problems existing in the prior art, the purpose of the present disclosure is to provide a targeted high-throughput sequencing method for detecting splicing isoforms.
为了实现上述目的,本公开采用以下具体技术方案:In order to achieve the above objectives, the present disclosure adopts the following specific technical solutions:
在一方面,本公开提供了一种用于高通量测序的测序文库的建立方法,其包括以下步骤:In one aspect, the present disclosure provides a method for establishing a sequencing library for high-throughput sequencing, comprising the following steps:
(1)采用逆转录引物对样本RNA进行逆转录,加入普通dNTP和3’修饰的dNTP,反应得到cDNA第一链,所述3’修饰的dNTP选自以下的一种或多种:AzNTP、AmNTP、propargyl-NTP、HalNTP;(1) reverse transcription of sample RNA using a reverse transcription primer, adding common dNTP and 3'-modified dNTP to obtain a first cDNA chain, wherein the 3'-modified dNTP is selected from one or more of the following: AzNTP, AmNTP, propargyl-NTP, HalNTP;
(2)通过点击化学反应将5’端带有炔基修饰的寡核苷酸片段与步骤(1)获得的cDNA片段连接,所述寡核苷酸片段包含随机序列和通用测序引物1(seq1)的互补区段序列;(2) connecting the oligonucleotide fragment with an alkyne modification at the 5′ end to the cDNA fragment obtained in step (1) by a click chemistry reaction, wherein the oligonucleotide fragment comprises a random sequence and a complementary segment sequence of universal sequencing primer 1 (seq1);
(3)以步骤(2)的反应产物为模板,进行PCR反应扩增;(3) using the reaction product of step (2) as a template, performing PCR amplification;
(4)获得测序文库。(4) Obtain a sequencing library.
在另一方面,本公开提供了一种根据上述方法构建的高通量测序文库。In another aspect, the present disclosure provides a high-throughput sequencing library constructed according to the above method.
在另一方面,本公开提供了一种用于检测剪接异构体的靶向高通量测序方法,其包括以下步骤:In another aspect, the present disclosure provides a targeted high-throughput sequencing method for detecting splicing isoforms, comprising the following steps:
(1)提取目标细胞RNA,根据上述方法构建测序文库;(1) extracting RNA from target cells and constructing a sequencing library according to the above method;
(2)基于上述测序文库,高通量测序获得所述细胞样本中目标基因的测序信息。(2) Based on the above sequencing library, high-throughput sequencing is performed to obtain sequencing information of the target gene in the cell sample.
在另一方面,本公开提供了一种利用前述用于高通量测序的测序文库的建立方法、前述的高通量测序文库和/或前述的靶向高通量测序方法在脱靶事件评估中的用途;In another aspect, the present disclosure provides a method for establishing a sequencing library for high-throughput sequencing, a use of the high-throughput sequencing library and/or the targeted high-throughput sequencing method in the evaluation of off-target events;
优选地,所述脱靶事件评估包括以下:Preferably, the off-target event assessment includes the following:
(1)精准确定反式剪接因子中发生脱靶反式剪接所在的基因组位置;(1) Accurately determine the genomic location where off-target trans-splicing occurs in trans-splicing factors;
(2)对目标反式剪接和脱靶反式剪接进行定量分析。(2) Quantitative analysis of on-target trans-splicing and off-target trans-splicing.
本公开所取得的有益效果至少如下:The beneficial effects achieved by the present disclosure are at least as follows:
(1)实现多重靶向剪接异构体的分析。传统qRT-PCR方法的通量受限于检测仪器的荧光数目、荧光间的串色效应(bleed through)、以及反应体系中引物间相互干扰。目前常用qRT-PCR的通量为四重,而本公开的HTAS(High-throughput Targeted Alternative Splicing)分析平台可以同时分析5-100(也可更多)pre-mRNA剪接事件;(1) Realize the analysis of multiple targeted splicing isoforms. The throughput of traditional qRT-PCR methods is limited by the number of fluorescence in the detection instrument, bleed-through between fluorescence, and mutual interference between primers in the reaction system. The throughput of the commonly used qRT-PCR is quadruple, while the HTAS (High-throughput Targeted Alternative Splicing) analysis platform disclosed in the present invention can simultaneously analyze 5-100 (or more) pre-mRNA splicing events;
(2)实现剪接异构体的精准定量:高通量测序在单分子水平(数字信号)对剪接异构体进行定量分析,而传统qRT-PCR定量依赖于荧光强度(模拟信号)。模拟信号只支持相对定量,
数字信号可实现样品剪接异构体的绝对定量。另外,HTAS的独特优点为逆转录过程中加入一定比例的3’修饰dNTP可随机终止第一链cDNA的延伸,因此同一类型剪接异构体可生成不同长度cDNA产物,在建库过程中不再受到剪接异构体PCR产物长度不同对扩增效率的干扰。最后,在5’接头的设计中引入随机序列(N5;最多12-16个核苷酸)可去除PCR扩增过程中的系统性偏差(PCR skewing);混合样本实验证明HTAS实际检出值和混合样本理论值相关性r>0.99;(2) Accurate quantification of splicing isoforms: High-throughput sequencing performs quantitative analysis of splicing isoforms at the single-molecule level (digital signal), while traditional qRT-PCR quantification relies on fluorescence intensity (analog signal). Analog signals only support relative quantification. Digital signals can achieve absolute quantification of sample splicing isoforms. In addition, the unique advantage of HTAS is that the addition of a certain proportion of 3' modified dNTPs during reverse transcription can randomly terminate the extension of the first-chain cDNA, so the same type of splicing isoforms can generate cDNA products of different lengths, and the amplification efficiency is no longer affected by the different lengths of splicing isoform PCR products during library construction. Finally, the introduction of random sequences (N5; up to 12-16 nucleotides) in the design of the 5' linker can remove systematic bias (PCR skewing) in the PCR amplification process; mixed sample experiments have shown that the correlation between the actual detection value of HTAS and the theoretical value of mixed samples is r>0.99;
(2)检测灵敏度:实施例中检出最低内含子保留(intron-retained,IR)剪接异构体比例为0.1%。随着测序深度提高,检测IR检出灵敏度预期可达1/105或更低,其检测灵敏度远高于传统RNA-seq,尤其是针对低丰度转录本;(2) Detection sensitivity: The minimum intron-retained (IR) splicing isoform ratio detected in the embodiment is 0.1%. With the increase of sequencing depth, the detection sensitivity of IR is expected to reach 1/10 5 or lower, which is much higher than that of traditional RNA-seq, especially for low-abundance transcripts;
(4)可检出未知的剪接异构体:由于组织和细胞的多样性,目前的检测体系中仍有大量的剪接异构体没能在转录组中被注释。传统qRT-PCR方案主要针对已知剪接异构体进行设计,而HTAS平台可同时检出已知和未知的剪接异构体,去除异构体分析中由于注释缺失对结果的干扰,提高定量分析的特异性;(4) Unknown splicing isoforms can be detected: Due to the diversity of tissues and cells, a large number of splicing isoforms in the current detection system have not been annotated in the transcriptome. Traditional qRT-PCR schemes are mainly designed for known splicing isoforms, while the HTAS platform can detect both known and unknown splicing isoforms, eliminating the interference of missing annotations in isoform analysis and improving the specificity of quantitative analysis;
(5)实现反式剪接(trans-splicing)脱靶事件评估:反式剪接是一个新型技术用于mRNA的定向编辑,但脱靶现象是其在临床应用中主要瓶颈之一。由于脱靶事件预期发生在转录组中任意pre-mRNA,目前尚无方法对脱靶现象进行系统性评估。HTAS只需知道反式剪接分子(Pre-mRNA Trans-splicing Molecule)序列即可对目标(on-target)反式剪接和脱靶(off-target)反式剪接同时进行定性定量分析;(5) Realize the evaluation of trans-splicing off-target events: Trans-splicing is a new technology for the targeted editing of mRNA, but the off-target phenomenon is one of the main bottlenecks in its clinical application. Since off-target events are expected to occur in any pre-mRNA in the transcriptome, there is currently no method to systematically evaluate the off-target phenomenon. HTAS only needs to know the sequence of the trans-splicing molecule (Pre-mRNA Trans-splicing Molecule) to simultaneously perform qualitative and quantitative analysis of on-target trans-splicing and off-target trans-splicing;
(6)操作简单,成本低:无需繁琐建库富集步骤,所有建库时间约4-5小时。单个样本成本大幅低于现有产品或方法。(6) Simple operation and low cost: No complicated library construction and enrichment steps are required, and the entire library construction time is about 4-5 hours. The cost per sample is significantly lower than existing products or methods.
图1为剪接异构的原理示意图。Figure 1 is a schematic diagram of the principle of splicing isomerization.
图2为高通量测序文库构建原理图(方案一)。Figure 2 is a schematic diagram of the high-throughput sequencing library construction (Scheme 1).
图3为高通量测序文库构建原理图(方案二)。Figure 3 is a schematic diagram of the high-throughput sequencing library construction (Scheme 2).
图4为高通量测序文库的构建原理图(方案三)。FIG4 is a schematic diagram of the construction principle of a high-throughput sequencing library (Scheme 3).
图5为不同混合比例的剪接异构体IVT产物实际检出值和混合样本理论值相关性示意图。FIG5 is a schematic diagram showing the correlation between the actual detection values of splicing isoform IVT products with different mixing ratios and the theoretical values of mixed samples.
I.定义I. Definitions
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的蛋白质和核酸化学、分子生物学、细胞和组织培养、微生物学、免疫学相关术语和实验室操作步骤均为相应领域内广泛使用的术语和常规步骤。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。In the present invention, unless otherwise specified, the scientific and technical terms used herein have the meanings commonly understood by those skilled in the art. In addition, the protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, immunology and laboratory operation procedures used herein are terms and routine procedures widely used in the corresponding fields. At the same time, in order to better understand the present invention, the definitions and explanations of the relevant terms are provided below.
除非另有定义,本公开所使用的所有的技术和科学术语与属于本公开的技术领域的技术人员通常理解的含义相同。本公开的说明书中所使用的术语只是为了描述具体的实施例的目的,不用于限制本公开。Unless otherwise defined, all technical and scientific terms used in the present disclosure have the same meanings as those commonly understood by those skilled in the art to which the present disclosure belongs. The terms used in the specification of the present disclosure are only for the purpose of describing specific embodiments and are not intended to limit the present disclosure.
本公开的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含
了一系列步骤的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤。The terms "include", "comprises", "comprising", and any variations thereof, of the present disclosure are intended to cover a non-exclusive inclusion. For example, A process, method, apparatus, product, or device that includes a series of steps is not limited to the listed steps or modules, but may optionally include unlisted steps, or may optionally include other steps inherent to these processes, methods, products, or devices.
在本公开中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。同时,为了更好地理解本公开,下面提供相关术语地定义和解释。如本文使用的和除非另作说明,术语“约”或“大约”是指在给定值或范围的加或减10%之内。在需要整数的情况下,该术语是指在给定值或范围的加或减10%之内、向上或向下舍入到最接近的整数。The "plurality" mentioned in the present disclosure refers to two or more. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the objects associated before and after are in an "or" relationship. At the same time, in order to better understand the present disclosure, the definitions and explanations of related terms are provided below. As used herein and unless otherwise specified, the term "about" or "approximately" means within plus or minus 10% of a given value or range. Where an integer is required, the term means within plus or minus 10% of a given value or range, rounded up or down to the nearest integer.
就多肽序列而言,短语“基本相同”可理解为表现出与参照多肽序列至少60%、65%、70%、75%、80%、85%、90%、95%、96%、97%、98%、99%或更多的序列同一性。就核酸序列而言,该术语可理解为表现出与参照核酸序列至少大于60%、65%、70%、75%、80%、85%、90%、95%、96%、97%、98%、99%或更高的序列同一性的核苷酸序列。With respect to polypeptide sequences, the phrase "substantially identical" is understood to mean a sequence that exhibits at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or more sequence identity to a reference polypeptide sequence. With respect to nucleic acid sequences, the term is understood to mean a nucleotide sequence that exhibits at least greater than 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or more sequence identity to a reference nucleic acid sequence.
本公开中使用的术语“Oligo(dT)引物”是指是约12-25个多聚胸腺嘧啶T组成的重复寡核苷酸序列,能够特异性地与真核生物mRNA的ploy(A)尾退火,故不适合缺少ploy(A)尾结构的RNA,如原核生物RNA或miRNA,也不适用于已降解的RNA,如FFPE样品中RNA。Oligo(dT)n VN是由固定且特异序列+(T)20左右+简并碱基VN组成,存在固定且特异的序列是为了便于设计通用的PCR下游引物。Oligo(dT)n VN序列中的VN指在3’端存在锚定碱基。(“V”代表dATP、dGTP或dCTP;“N”代表dATP、dTTP、dGTP、dCTP中的任意一种)。锚定碱基的作用是能够特异性地结合到Poly(A)的5’末端,以免逆转录出过多的T碱基。The term "Oligo(dT) primer" used in the present disclosure refers to a repetitive oligonucleotide sequence composed of about 12-25 polythymine Ts, which can specifically anneal to the ploy(A) tail of eukaryotic mRNA, and is therefore not suitable for RNA lacking a ploy(A) tail structure, such as prokaryotic RNA or miRNA, nor for degraded RNA, such as RNA in FFPE samples. Oligo(dT)n VN is composed of a fixed and specific sequence + (T)20 or so + degenerate base VN. The fixed and specific sequence exists to facilitate the design of universal PCR downstream primers. The VN in the Oligo(dT)n VN sequence refers to the presence of an anchor base at the 3' end. ("V" represents dATP, dGTP or dCTP; "N" represents any one of dATP, dTTP, dGTP, and dCTP). The function of the anchor base is to be able to specifically bind to the 5' end of Poly(A) to prevent excessive T bases from being reverse transcribed.
本公开中使用的术语“点击化学”在本领域中是众所周知的并且通常是指易于纯化和区特异性的快速反应。点击化学是一类允许所选择的底物与特异性分子连接的反应。点击化学不是单一的特异性反应,而是描述了一种按照自然界的实例产生产物的方式,所述方式也通过连接小的模块化单元来产生物质。在许多应用中,点击反应将生物分子和报告分子连接。点击化学不限于生物条件:“点击”反应的概念已用于药理学和各种仿生应用。然而,所述应用在生物分子的检测、定位和鉴定方面特别有用。典型的点击反应是经典的点击反应是形成5元杂原子环的铜催化的叠氮化物与炔烃的反应:Cu(I)催化的叠氮化物-炔烃环加成(CuAAC)。The term "click chemistry" as used in the present disclosure is well known in the art and generally refers to a fast reaction that is easy to purify and region-specific. Click chemistry is a class of reactions that allow a selected substrate to be connected to a specific molecule. Click chemistry is not a single specific reaction, but describes a way of producing products according to examples in nature, which also produces substances by connecting small modular units. In many applications, click reactions connect biomolecules and reporter molecules. Click chemistry is not limited to biological conditions: the concept of "click" reactions has been used in pharmacology and various biomimetic applications. However, the application is particularly useful in the detection, localization and identification of biomolecules. A typical click reaction is the classic click reaction is the reaction of copper-catalyzed azides and alkynes to form a 5-membered heteroatom ring: Cu(I)-catalyzed azide-alkyne cycloaddition (CuAAC).
本公开中使用的术语“接触”、“添加”、“反应”、“处理”等意指使一种反应物、试剂、溶剂、催化剂、反应性基团等与另一种反应物、试剂、溶剂、催化剂、反应性基团等接触。可以单独、同时或分开添加并且可以以实现期望结果的任何顺序添加反应物、试剂、溶剂、催化剂、反应性基团等。可以在存在或不存在加热或冷却设备的情况下添加并且可以任选地在惰性气氛下添加所述反应物、所述试剂、所述溶剂、所述催化剂、所述反应性基团等。The terms "contacting," "adding," "reacting," "treating," and the like as used in this disclosure mean contacting one reactant, reagent, solvent, catalyst, reactive group, etc. with another reactant, reagent, solvent, catalyst, reactive group, etc. The reactants, reagents, solvents, catalysts, reactive groups, etc. may be added individually, simultaneously, or separately and may be added in any order that achieves the desired result. The reactants, the reagents, the solvents, the catalysts, the reactive groups, etc. may be added with or without heating or cooling equipment and may optionally be added under an inert atmosphere.
本公开中使用的术语“互补”是指两个多核苷酸链的区之间或两个核苷酸之间通过碱基配对的序列互补性的广义概念。已知腺嘌呤核苷酸能够与胸腺嘧啶或尿嘧啶核苷酸形成特异性氢键(“碱基配对”)。类似地,已知胞嘧啶核苷酸能够与鸟嘌呤核苷酸碱基配对。The term "complementary" used in this disclosure refers to the broad concept of sequence complementarity between regions of two polynucleotide chains or between two nucleotides by base pairing. It is known that adenine nucleotides can form specific hydrogen bonds ("base pairing") with thymine or uracil nucleotides. Similarly, it is known that cytosine nucleotides can base pair with guanine nucleotides.
本公开中使用的术语“文库”在关于核酸使用时旨在意指具有不同化学组成(例如,不同序列、不同长度等)的核酸的集合。通常,文库中的核酸将是具有某个属或类的共同特征或特性的不同物种,但在其它方面在某种程度上有所不同。例如,文库可以包含核苷酸序列不同、但在具有糖-磷酸酯主链方面类似的核酸物种。可以使用本领域已知的技术创建文库。本文所例示
的核酸可以包含从任何来源,包含例如基因组(例如,人基因组)或基因组混合物的消化获得的核酸。在另一个实例中,核酸可以是从特定环境或生态系统的宏基因组研究中获得的那些核酸。所述术语还包含人工创建的核酸文库,如DNA文库。As used in this disclosure, the term "library" when used with respect to nucleic acids is intended to mean a collection of nucleic acids having different chemical compositions (e.g., different sequences, different lengths, etc.). Typically, the nucleic acids in a library will be different species having common features or properties of a genus or class, but differing to some extent in other respects. For example, a library may contain nucleic acid species that differ in nucleotide sequence but are similar in having a sugar-phosphate backbone. Libraries may be created using techniques known in the art. The examples exemplified herein are The nucleic acid may include nucleic acids obtained from any source, including, for example, a genome (e.g., a human genome) or a mixture of genomes digested. In another example, the nucleic acid may be those obtained from a metagenomic study of a particular environment or ecosystem. The term also includes artificially created nucleic acid libraries, such as DNA libraries.
本公开中使用的术语“随机引物”或“随机六聚体引物”或“Random hexamer”或“Random hexamer primer”在本领域是众所周知的并且通常是指随机序列的短寡脱氧核糖核苷酸(d(N)6),退火至靶标DNA或RNA上的随机互补位点,用作DNA聚合酶或逆转录酶进行DNA合成的引物。The terms "random primer" or "random hexamer primer" or "Random hexamer" or "Random hexamer primer" as used in this disclosure are well known in the art and generally refer to short oligodeoxyribonucleotides (d(N)6) of random sequence that anneal to random complementary sites on the target DNA or RNA and are used as primers for DNA synthesis by DNA polymerase or reverse transcriptase.
本公开中使用的术语“AzNTP(3'-叠氮基-2',3'dNTP)”为叠氮化物脱氧核苷酸,其中,碱基选自腺嘌呤、鸟嘌呤、胞嘧啶和胸腺嘧啶。The term "AzNTP (3'-azido-2',3'dNTP)" as used in the present disclosure is an azide deoxynucleotide, wherein the base is selected from adenine, guanine, cytosine and thymine.
本公开中使用的术语“Add on PCR”是指参与PCR的引物上除了有与模板互补的序列外,还有一些其他序列,这些其他序列不参与本轮PCR反应,但生成的PCR产物会因为额外附带上这些序列,而可作为模板为下一步PCR反应提供模板。The term "Add on PCR" used in the present disclosure means that the primers involved in PCR have some other sequences in addition to the sequences complementary to the template. These other sequences do not participate in this round of PCR reaction, but the generated PCR products can serve as templates for the next PCR reaction because of the additional sequences.
II.具体实施方式详述II. Detailed description of specific implementation methods
在一些实施方案中,本公开提供了一种用于高通量测序的测序文库的建立方法,其包括以下步骤:In some embodiments, the present disclosure provides a method for establishing a sequencing library for high-throughput sequencing, comprising the following steps:
(1)采用逆转录引物对样本RNA进行逆转录,加入普通dNTP和3’修饰的dNTP,反应得到cDNA第一链,所述3’修饰的dNTP选自以下的一种或多种:AzNTP、AmNTP、propargyl-NTP、HalNTP;(1) reverse transcription of sample RNA using a reverse transcription primer, adding common dNTP and 3'-modified dNTP to obtain a first cDNA chain, wherein the 3'-modified dNTP is selected from one or more of the following: AzNTP, AmNTP, propargyl-NTP, HalNTP;
(2)通过点击化学反应将5’端带有炔基修饰的寡核苷酸片段与步骤(1)获得的cDNA片段连接,所述寡核苷酸片段包含随机序列和通用测序引物1(seq1)的互补区段序列;(2) connecting the oligonucleotide fragment with an alkyne modification at the 5′ end to the cDNA fragment obtained in step (1) by a click chemistry reaction, wherein the oligonucleotide fragment comprises a random sequence and a complementary segment sequence of universal sequencing primer 1 (seq1);
(3)以步骤(2)的反应产物为模板,进行PCR反应扩增;(3) using the reaction product of step (2) as a template, performing PCR amplification;
(4)获得测序文库。(4) Obtain a sequencing library.
在一些实施方案中,所述点击化学反应是指CuAAC点击反应,即一价铜离子催化叠氮化物-炔烃环加成反应。In some embodiments, the click chemistry reaction refers to a CuAAC click reaction, i.e., a copper ion-catalyzed azide-alkyne cycloaddition reaction.
在一些实施方案中,所述步骤(1)逆转录的PCR反应包括五个反应阶段:具体反应条件为25℃反应10min,37℃反应10min,50℃反应45min,85℃反应2min,12℃保持。In some embodiments, the reverse transcription PCR reaction in step (1) includes five reaction stages: the specific reaction conditions are 25°C for 10 min, 37°C for 10 min, 50°C for 45 min, 85°C for 2 min, and 12°C for maintenance.
在一些实施方案中,所述步骤(1)中逆转录PCR反应中所使用的酶包括HiScript III Reverse Transcriptase(R302-01,南京诺唯赞生物科技股份有限公司)、SuperScriptTM III逆转录酶(18080093,ThermoFisher SCIENTIFIC)、HiFi II M-MLV(H-)Reverse Transcriptase(CW0743,康为世纪生物科技股份有限公司)、Reverse Transcriptase[M-MLV,RNaseH-](AE101-02,北京全式金生物技术股份有限公司)、MutiScript II Reverse Transcriptase(MD311,菲鹏生物股份有限公司)。In some embodiments, the enzymes used in the reverse transcription PCR reaction in step (1) include HiScript III Reverse Transcriptase (R302-01, Nanjing Novogene Biotechnology Co., Ltd.), SuperScript TM III Reverse Transcriptase (18080093, ThermoFisher SCIENTIFIC), HiFi II M-MLV (H-) Reverse Transcriptase (CW0743, Kangwei Century Biotechnology Co., Ltd.), Reverse Transcriptase [M-MLV, RNaseH-] (AE101-02, Beijing Quanshijin Biotechnology Co., Ltd.), MutiScript II Reverse Transcriptase (MD311, Feipeng Biotechnology Co., Ltd.).
在一些实施方案中,所述步骤(1)中所述的逆转录引物是随机引物或基于靶向基因的保留内含子下游外显子设计的基因特异性引物群1,所述基因特异性引物群1中的引物5’端带有通用测序引物2序列(seq2)。In some embodiments, the reverse transcription primer described in step (1) is a random primer or a gene-specific primer group 1 designed based on the downstream exon of the retained intron of the targeted gene, and the 5' end of the primer in the gene-specific primer group 1 carries a universal sequencing primer 2 sequence (seq2).
在一些实施方案中,所述步骤(2)点击化学的反应体系还包括维生素C、铜(II)-TBTA组合物、DMSO。
In some embodiments, the click chemistry reaction system in step (2) further includes vitamin C, a copper (II)-TBTA composition, and DMSO.
在一些实施方案中,所述步骤(3)中PCR扩增反应包括四个反应阶段:第一PCR扩增反应为1个循环,具体反应条件为94℃反应1min,60℃反应30s,68℃反应10min;第二PCR扩增反应包括12个循环,具体反应条件为94℃反应30s,60℃反应30s,68℃反应2min;第三PCR扩增反应包括1个循环,具体反应条件为68℃反应5min;第四PCR扩增反应包括1个循环,具体反应条件为12℃。In some embodiments, the PCR amplification reaction in step (3) includes four reaction stages: the first PCR amplification reaction is 1 cycle, and the specific reaction conditions are 94°C for 1 min, 60°C for 30 s, and 68°C for 10 min; the second PCR amplification reaction includes 12 cycles, and the specific reaction conditions are 94°C for 30 s, 60°C for 30 s, and 68°C for 2 min; the third PCR amplification reaction includes 1 cycle, and the specific reaction conditions are 68°C for 5 min; the fourth PCR amplification reaction includes 1 cycle, and the specific reaction conditions are 12°C.
在一些实施方案中,所述步骤(3)中PCR反应体系还包括MgCl2、DMSO、Tris-HCl、EDTA、NaCl、KCl等PCR Buffer溶液。In some embodiments, the PCR reaction system in step (3) further includes PCR Buffer solutions such as MgCl 2 , DMSO, Tris-HCl, EDTA, NaCl, and KCl.
在一些实施方案中,所述步骤(3)中PCR反应所用的酶为Taq DNA聚合酶。In some embodiments, the enzyme used in the PCR reaction in step (3) is Taq DNA polymerase.
在一些实施方案中,所述步骤(3)采用的PCR扩增引物包括通用测序引物1和基因特异性引物群2,其基因特异性引物群2中的引物的5’端带有通用测序引物2序列。In some embodiments, the PCR amplification primers used in step (3) include universal sequencing primer 1 and gene-specific primer group 2, wherein the 5' end of the primer in the gene-specific primer group 2 carries the universal sequencing primer 2 sequence.
在一些实施方案中,所述基因特异性引物群1和2均是基于特定靶向基因的保留内含子选择性剪接事件下游的外显子设计,其中,基因特异性引物2的靶向位点比基因特异性引物1的位点向上游上移5-100个碱基。In some embodiments, both gene-specific primer groups 1 and 2 are designed based on exons downstream of alternative splicing events of retained introns of specific targeted genes, wherein the targeting site of gene-specific primer 2 is shifted 5-100 bases upstream of the site of gene-specific primer 1.
在一些实施方案中,所述基因特异性引物2的靶向位点比基因特异性引物1的位点向上游上移5、10、20、30、40、50、60、70、80、90或100个碱基。In some embodiments, the targeting site of gene-specific primer 2 is shifted upstream by 5, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 bases than the site of gene-specific primer 1.
在一些实施方案中,所述基因特异性引物群2的靶向位点比基因特异性引物群1的位置向上游上移20-50个碱基。In some embodiments, the targeting site of gene-specific primer group 2 is shifted 20-50 bases upstream of the position of gene-specific primer group 1.
在一些实施方案中,所述基因特异性引物群1和基因特异性引物群2分别靶向的目标基因的个数大于等于1。In some embodiments, the number of target genes targeted by the gene-specific primer group 1 and the gene-specific primer group 2 respectively is greater than or equal to 1.
在一些实施方案中,所述基因特异性引物群1和2均是基于特定靶向基因保留内含子下游的外显子设计。In some embodiments, both the gene-specific primer groups 1 and 2 are designed based on the exons downstream of the retained introns of the specific targeted gene.
在一些实施方案中,所述基因特异性引物群1和2的靶向位置可以部分重叠,但不能完全一致。In some embodiments, the targeting positions of the gene-specific primer groups 1 and 2 may partially overlap but not completely coincide.
在一些实施方案中,步骤(1)中加入普通dNTP和3’修饰的dNTP的摩尔浓度比例为1:1-1:100,其摩尔浓度比例可以为1:100、1:90、1:80、1:75、1:70、1:65、1:60、1:55、1:50、1:45、1:40、1:35、1:30、1:25、1:20、1:19、1:18、1:17、1:16、1:15、1:14、1:13、1:12、1:11或1:10;在一个优选的实施方案中,步骤(1)中加入普通dNTP和3’修饰的dNTP的摩尔浓度比例为1:50。In some embodiments, the molar concentration ratio of common dNTPs and 3'-modified dNTPs added in step (1) is 1:1-1:100, and the molar concentration ratio can be 1:100, 1:90, 1:80, 1:75, 1:70, 1:65, 1:60, 1:55, 1:50, 1:45, 1:40, 1:35, 1:30, 1:25, 1:20, 1:19, 1:18, 1:17, 1:16, 1:15, 1:14, 1:13, 1:12, 1:11 or 1:10; in a preferred embodiment, the molar concentration ratio of common dNTPs and 3'-modified dNTPs added in step (1) is 1:50.
在一个优选的实施方案中,所述步骤(1)中加入普通dNTP和3’修饰的dNTP的摩尔浓度比例为1:20。In a preferred embodiment, the molar concentration ratio of common dNTP and 3'-modified dNTP added in step (1) is 1:20.
在一些实施方案中,步骤(2)中所述随机序列包含4-16个核苷酸。In some embodiments, the random sequence in step (2) comprises 4-16 nucleotides.
在一些实施方案中,步骤(2)中所述随机序列包含4、5、6、7、8、9、10、11、12、13、14、15或16个核苷酸。In some embodiments, the random sequence in step (2) comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or 16 nucleotides.
在一些实施方案中,步骤(2)中所述随机序列包含5个核苷酸。In some embodiments, the random sequence in step (2) comprises 5 nucleotides.
在一些实施方案中,步骤(3)的PCR扩增包括采用通用测序引物1和基因特异性引物群2对步骤(2)的反应产物进行PCR扩增后,再对扩增产物添加接头结构的步骤,所述接头结构包括P5/P7接头和核酸条形码,所述核酸条形码连接在引物P5和/或P7接头端。In some embodiments, the PCR amplification of step (3) includes the step of performing PCR amplification on the reaction product of step (2) using universal sequencing primer 1 and gene-specific primer group 2, and then adding a connector structure to the amplified product, wherein the connector structure includes a P5/P7 connector and a nucleic acid barcode, and the nucleic acid barcode is connected to the primer P5 and/or P7 connector end.
在一些实施方案中,所述步骤(3)为:以步骤(2)的反应产物为模板,添加P5和P7接
头进行PCR反应。In some embodiments, step (3) is: using the reaction product of step (2) as a template, adding P5 and P7 PCR reaction was performed on the first.
在一些实施方案中,所述P5接头引物如SEQ ID NO:54所示。In some embodiments, the P5 linker primer is shown as SEQ ID NO: 54.
在一些实施方案中,所述P7接头引物如SEQ ID NO:55-57所示。In some embodiments, the P7 linker primer is as shown in SEQ ID NO: 55-57.
在一些实施方案中,所述核酸条形码连接在引物P7接头单端。In some embodiments, the nucleic acid barcode is attached to a single end of the primer P7 adapter.
在一些实施方案中,所述核酸条形码连接在引物P5和P7接头双端。In some embodiments, the nucleic acid barcode is attached to both ends of primer P5 and P7 adapter.
在一些实施方式中,所述核酸条形码分为核酸条形码5和核酸条形码7。In some embodiments, the nucleic acid barcode is divided into nucleic acid barcode 5 and nucleic acid barcode 7.
在一些实施方式中,所述核酸条形码5的核苷酸序列选自SEQ ID NO.10-35任一项所示的序列。In some embodiments, the nucleotide sequence of the nucleic acid barcode 5 is selected from the sequence shown in any one of SEQ ID NO.10-35.
在一些实施方式中,所述核酸条形码7的核苷酸序列选自SEQ ID NO.36-53任一项所示的序列。In some embodiments, the nucleotide sequence of the nucleic acid barcode 7 is selected from the sequence shown in any one of SEQ ID NO.36-53.
在一些实施方案中,所述步骤(3)中PCR扩增反应包括四个反应阶段:第一PCR扩增反应为1个循环,具体反应条件为94℃反应30s;第二PCR扩增反应包括18个循环,具体反应条件为94℃反应30s,68℃反应30s,72℃反应30s;第三PCR扩增反应包括1个循环,具体反应条件为72℃反应5min;第四PCR扩增反应包括1个循环,具体反应条件为12℃。In some embodiments, the PCR amplification reaction in step (3) includes four reaction stages: the first PCR amplification reaction is 1 cycle, and the specific reaction conditions are 94°C for 30 seconds; the second PCR amplification reaction includes 18 cycles, and the specific reaction conditions are 94°C for 30 seconds, 68°C for 30 seconds, and 72°C for 30 seconds; the third PCR amplification reaction includes 1 cycle, and the specific reaction conditions are 72°C for 5 minutes; the fourth PCR amplification reaction includes 1 cycle, and the specific reaction conditions are 12°C.
在一些实施方案中,所述步骤(3)中PCR反应体系还包括MgCl2、DMSO、Tris-HCl、EDTA、NaCl、KCl等PCR Buffer溶液。In some embodiments, the PCR reaction system in step (3) further includes PCR Buffer solutions such as MgCl 2 , DMSO, Tris-HCl, EDTA, NaCl, and KCl.
在一些实施方案中,所述步骤(3)中PCR反应所用的酶为Taq DNA聚合酶。In some embodiments, the enzyme used in the PCR reaction in step (3) is Taq DNA polymerase.
在一些实施方案中,所述通用测序引物1序列选自SEQ ID NO:3、SEQ ID NO:5或SEQ ID NO:6中的任一项。In some embodiments, the universal sequencing primer 1 sequence is selected from any one of SEQ ID NO: 3, SEQ ID NO: 5 or SEQ ID NO: 6.
在一些实施方案中,所述通用测序引物1的互补区段序列如SEQ ID NO:4所示。In some embodiments, the complementary segment sequence of the universal sequencing primer 1 is shown as SEQ ID NO:4.
在一些实施方案中,所述通用测序引物2序列(seq2)选自SEQ ID NO:7或SEQ ID NO:8中的任一项。In some embodiments, the universal sequencing primer 2 sequence (seq2) is selected from any one of SEQ ID NO:7 or SEQ ID NO:8.
在一些实施方案中,本公开提供了一种根据上述方法构建的高通量测序文库。In some embodiments, the present disclosure provides a high-throughput sequencing library constructed according to the above method.
在一些实施方案中,本公开提供了一种用于检测剪接异构体的靶向高通量测序方法,其包括以下步骤:In some embodiments, the present disclosure provides a targeted high-throughput sequencing method for detecting splicing isoforms, comprising the following steps:
(1)提取样本RNA,根据上述方法构建测序文库;(1) extracting sample RNA and constructing a sequencing library according to the above method;
(2)基于上述测序文库,高通量测序获得所述细胞样本中目标基因的测序信息。(2) Based on the above sequencing library, high-throughput sequencing is performed to obtain sequencing information of the target gene in the cell sample.
在一些实施方案中,所述样本为组织、细胞和/或体液样本。In some embodiments, the sample is a tissue, cell, and/or body fluid sample.
在一些实施方案中,所述体液样本包括血液、唾液、尿液、母乳、脑脊液、羊水、腹水、胆汁以及胸腔积液中的一种或多种。In some embodiments, the body fluid sample includes one or more of blood, saliva, urine, breast milk, cerebrospinal fluid, amniotic fluid, ascites, bile, and pleural effusion.
在一些实施方案中,所述样本为细胞样本。In some embodiments, the sample is a cell sample.
在一些实施方案中,所述细胞样本包括但不限于MCF10A、MCF7、HeLa、HEK293T和/或MDA-MB-231。In some embodiments, the cell samples include, but are not limited to, MCF10A, MCF7, HeLa, HEK293T, and/or MDA-MB-231.
在一些实施方案中,所述目标基因包括但不限于ATP13A1、CXXC1、ECHDC2、FGFRL1、HMGN3、KLHL17、NAXD、LZTR1、SELENBP1、JMJD8、PSMB1、HIGD2A、HNRNPAB、SMARCC1、ATP5IF1、HIGD2B、RPS21、UQCC5、NFATC3、PCNP和/或OSGEP。In some embodiments, the target genes include but are not limited to ATP13A1, CXXC1, ECHDC2, FGFRL1, HMGN3, KLHL17, NAXD, LZTR1, SELENBP1, JMJD8, PSMB1, HIGD2A, HNRNPAB, SMARCC1, ATP5IF1, HIGD2B, RPS21, UQCC5, NFATC3, PCNP and/or OSGEP.
在一些实施方案中,本公开提供了一种利用前述的用于高通量测序的测序文库的建立方法、前述的高通量测序文库和/或前述的靶向高通量测序方法在脱靶事件评估中的用途。
In some embodiments, the present disclosure provides a method for establishing a sequencing library for high-throughput sequencing, the use of the high-throughput sequencing library and/or the targeted high-throughput sequencing method in the evaluation of off-target events.
在一些实施方案中,所述脱靶事件评估包括以下:In some embodiments, the off-target event assessment includes the following:
(1)精准确定反式剪接因子中发生脱靶反式剪接所在的基因组位置;(1) Accurately determine the genomic location where off-target trans-splicing occurs in trans-splicing factors;
(2)对目标反式剪接和脱靶反式剪接进行定量分析。(2) Quantitative analysis of on-target trans-splicing and off-target trans-splicing.
在一些实施方案中,所述的高通量测序方法可用于检测转录组中任一种形式的可变剪接和任何转录本的组合。In some embodiments, the high-throughput sequencing method can be used to detect any form of alternative splicing and any combination of transcripts in the transcriptome.
在一些实施方案中,所述用于高通量测序的测序文库的构建过程如下所述,其原理图见附图2:In some embodiments, the construction process of the sequencing library for high-throughput sequencing is as follows, and its principle diagram is shown in FIG2 :
1)逆转录:以RNA为模板通过逆转录合成cDNA第一链,使用随机六聚体引物Random hexamer作为逆转录引物,并且加入普通dNTP和一定浓度的3’修饰dNTP进行逆转录反应,其中,所述3’修饰的dNTP为AzNTP(3'-叠氮基-2',3'dNTP),其能在链延伸时能够阻止添加下一个dNTP结合,同时所述3’修饰基团应可用于下游接头进行化学连接。Random Hexamer是包括6个碱基的随机序列单链DNA,随机序列的特性可帮助其随机结合至RNA的不同片段。逆转录过程中,在逆转录酶的作用下,以RNA为模板在引物3’末端添加dNTPs合成单链cDNA。当3’修饰dNTP替代dNTPs添加至cDNA单链上时,链延伸终止,完成cDNA第一链合成。1) Reverse transcription: The first strand of cDNA is synthesized by reverse transcription using RNA as a template, using a random hexamer primer as a reverse transcription primer, and adding ordinary dNTP and a certain concentration of 3' modified dNTP for reverse transcription reaction, wherein the 3' modified dNTP is AzNTP (3'-azido-2', 3'dNTP), which can prevent the addition of the next dNTP from binding during chain extension, and the 3' modified group should be used for chemical connection with the downstream linker. Random Hexamer is a single-stranded DNA with a random sequence of 6 bases, and the characteristics of the random sequence can help it randomly bind to different fragments of RNA. During the reverse transcription process, under the action of reverse transcriptase, dNTPs are added to the 3' end of the primer using RNA as a template to synthesize single-stranded cDNA. When the 3' modified dNTP replaces dNTPs and is added to the cDNA single strand, the chain extension is terminated and the synthesis of the first strand of cDNA is completed.
2)点击化学连接:将上述获得的cDNA与5’端含有炔基修饰的寡核苷酸(oligonucleotides)进行点击化学反应(Click Reaction,即CuAAC点击反应——一价铜离子催化叠氮化物-炔烃环加成反应)。其中,寡核苷酸接头序列包含5’N5+NGS通用测序引物(seq1)互补区段序列,作为后续测序建库中PCR引物结合位点。N5为5个随机碱基组成序列,用于保证测序文库分子多样性以及在结果分析中去除PCR扩增过程中的系统性偏差和测序错误(sequencing error)。随机碱基序列的长度可调整为N4、N5、N6、N7、N8、N9、N10、N11、N12、N13、N14、N15、N16。其中N选自ATCG中的任一碱基。2) Click chemistry connection: The cDNA obtained above is subjected to a click chemistry reaction (i.e., CuAAC click reaction—monovalent copper ion-catalyzed azide-alkyne cycloaddition reaction) with an oligonucleotide modified with an alkyne group at the 5' end. The oligonucleotide adapter sequence contains a complementary segment sequence of 5'N5+NGS universal sequencing primer (seq1), which serves as a PCR primer binding site in the subsequent sequencing library construction. N5 is a sequence composed of 5 random bases, which is used to ensure the molecular diversity of the sequencing library and to remove systematic biases and sequencing errors in the PCR amplification process in the result analysis. The length of the random base sequence can be adjusted to N4, N5, N6, N7, N8, N9, N10, N11, N12, N13, N14, N15, and N16. Where N is selected from any base in ATCG.
3)靶向富集:以点击化学反应产物为模板,通过PCR反应靶向富集目标基因片段。PCR体系中的引物包括一个通用测序引物1(seq1,其3’端序列与上述炔基引物中3’端互补)和基因特异性引物群。引物群基于每个靶向基因的保留内含子下游的外显子而设计,并在引物5’端添加通用测序引物2序列(seq2)。CLICK产物中含有叠氮化物-炔烃连接键,因此,DNA聚合酶需使用Taq或其他可扩增该种模板的聚合酶。3) Targeted enrichment: Using the click chemistry reaction product as a template, the target gene fragment is targeted and enriched through PCR reaction. The primers in the PCR system include a universal sequencing primer 1 (seq1, whose 3' end sequence is complementary to the 3' end of the above-mentioned alkyne primer) and a gene-specific primer group. The primer group is designed based on the exons downstream of the retained introns of each targeted gene, and a universal sequencing primer 2 sequence (seq2) is added to the 5' end of the primer. The CLICK product contains an azide-alkyne linker, so the DNA polymerase needs to use Taq or other polymerases that can amplify this template.
4)测序接头连接及富集:以上述富集片段为模板,进行PCR反应为富集片段添加完整的P5和P7接头,用于后续NGS测序。并加入核酸条形码(sample barcode)。核酸条形码可实现高通量测序过程中的混样,提高检测通量、降低样本测序过程中的仪器误差。4) Sequencing adapter connection and enrichment: Using the above enriched fragments as templates, PCR reaction is performed to add complete P5 and P7 adapters to the enriched fragments for subsequent NGS sequencing. A nucleic acid barcode (sample barcode) is added. Nucleic acid barcodes can achieve mixed samples during high-throughput sequencing, improve detection throughput, and reduce instrument errors during sample sequencing.
在一些实施方案中,所述高通量测序文库构建过程中逆转录引物为基于靶向基因的保留内含子下游外显子设计的基因特异性引物群1,所述特异性引物5’端添加通用测序引物序列(seq2);在一些实施方案中,在实际构建过程中可直接在点击化学连接产物纯化后进行测序接头连接及富集;在一些实施方案中,所述用于高通量测序的测序文库的构建过程如下所述,其原理图见附图3:In some embodiments, the reverse transcription primer in the high-throughput sequencing library construction process is a gene-specific primer group 1 designed based on the downstream exon of the retained intron of the targeted gene, and a universal sequencing primer sequence (seq2) is added to the 5' end of the specific primer; in some embodiments, in the actual construction process, the sequencing adapter connection and enrichment can be directly performed after the click chemistry connection product is purified; in some embodiments, the construction process of the sequencing library for high-throughput sequencing is as follows, and its schematic diagram is shown in Figure 3:
1)逆转录:以RNA为模板通过逆转录合成cDNA第一链,使用基于靶向基因的保留内含子下游外显子设计的基因特异性引物群1,作为逆转录引物,并在5’端添加通用测序引物序列(seq2),加入普通dNTP和一定浓度的3’修饰dNTP进行逆转录反应,其中,所述3’修饰的dNTP为AzNTP(3'-叠氮基-2',3'dNTP),其能在链延伸时能够阻止添加下一个dNTP结合,同
时所述3’修饰基团应可用于下游接头进行化学连接。逆转录过程中,在逆转录酶的作用下,以RNA为模板在引物3’末端添加dNTPs合成单链cDNA。当3’修饰dNTP替代dNTPs添加至cDNA单链上时,链延伸终止,完成cDNA第一链合成。1) Reverse transcription: RNA is used as a template to synthesize the first strand of cDNA by reverse transcription, and a gene-specific primer group 1 designed based on the downstream exons of the retained introns of the targeted gene is used as a reverse transcription primer, and a universal sequencing primer sequence (seq2) is added to the 5' end, and ordinary dNTP and a certain concentration of 3' modified dNTP are added to perform a reverse transcription reaction, wherein the 3' modified dNTP is AzNTP (3'-azido-2', 3'dNTP), which can prevent the addition of the next dNTP from binding during chain extension, and at the same time The 3' modification group should be available for chemical connection to the downstream linker. During reverse transcription, under the action of reverse transcriptase, dNTPs are added to the 3' end of the primer using RNA as a template to synthesize single-stranded cDNA. When the 3' modified dNTP replaces dNTPs and is added to the cDNA single strand, the chain extension is terminated and the synthesis of the first strand of cDNA is completed.
2)点击化学连接:将上述获得的cDNA与5’端含有炔基修饰的寡核苷酸(oligonucleotides)进行点击化学反应(Click Reaction,即CuAAC点击反应——一价铜离子催化叠氮化物-炔烃环加成反应)。其中,寡核苷酸接头序列包含5’N5+NGS通用测序引物(seq1)互补区段序列,作为后续测序建库中PCR引物结合位点。N5为5个随机碱基组成序列,用于保证测序文库分子多样性以及在结果分析中去除PCR扩增过程中的系统性偏差和测序错误(sequencing error)。随机碱基序列的长度可调整为N4、N5、N6、N7、N8、N9、N10、N11、N12、N13、N14、N15、N16。2) Click chemistry connection: The cDNA obtained above is subjected to a click chemistry reaction (i.e., CuAAC click reaction—monovalent copper ion-catalyzed azide-alkyne cycloaddition reaction) with an oligonucleotide modified with an alkyne group at the 5’ end. The oligonucleotide adapter sequence contains a complementary segment sequence of 5’N5+NGS universal sequencing primer (seq1), which serves as a PCR primer binding site in the subsequent sequencing library construction. N5 is a sequence composed of 5 random bases, which is used to ensure the molecular diversity of the sequencing library and to remove systematic deviations and sequencing errors in the PCR amplification process in the result analysis. The length of the random base sequence can be adjusted to N4, N5, N6, N7, N8, N9, N10, N11, N12, N13, N14, N15, and N16.
3)测序接头连接及富集:以点击化学反应产物为模板,添加完整的P5和P7接头,用于后续PCR扩增。P5接头可与点击化学反应产物中的Seq 1互补序列进行互补,P7接头可与PCR反应过程中产生的含有Seq 2互补序列进行互补。并加入核酸条形码(sample barcode)。核酸条形码可实现高通量测序过程中的混样,提高检测通量、降低样本测序过程中的仪器误差。经过数轮PCR反应后实现P5和P7接头的连接以及待测序片段的富集,以便用于后续NGS测序。CLICK产物中含有叠氮化物-炔烃连接键,因此,DNA聚合酶需使用Taq或其他可扩增该种模板的聚合酶。3) Sequencing adapter connection and enrichment: Using the click chemistry reaction product as a template, add complete P5 and P7 adapters for subsequent PCR amplification. The P5 adapter can complement the Seq 1 complementary sequence in the click chemistry reaction product, and the P7 adapter can complement the Seq 2 complementary sequence generated during the PCR reaction. And add a nucleic acid barcode (sample barcode). Nucleic acid barcodes can achieve mixed samples during high-throughput sequencing, improve detection throughput, and reduce instrument errors during sample sequencing. After several rounds of PCR reactions, the P5 and P7 adapters are connected and the fragments to be sequenced are enriched for subsequent NGS sequencing. The CLICK product contains an azide-alkyne linker, so the DNA polymerase needs to use Taq or other polymerases that can amplify this template.
在一些实施方案中,所述高通量测序文库构建过程中逆转录引物为基于靶向基因的保留内含子下游外显子设计的基因特异性引物群1;在一些实施方案中,在点击化学连接反应后进行特异性基因PCR,以去除模板中的核糖体RNA对测序数据的影响;在一些实施方案中,所述用于高通量测序的测序文库的构建过程如下所述,其原理图见附图4:In some embodiments, the reverse transcription primer used in the high-throughput sequencing library construction process is a gene-specific primer group 1 designed based on the downstream exons of the retained introns of the targeted gene; in some embodiments, a specific gene PCR is performed after the click chemistry ligation reaction to remove the influence of ribosomal RNA in the template on the sequencing data; in some embodiments, the construction process of the sequencing library for high-throughput sequencing is as follows, and its principle diagram is shown in Figure 4:
1)逆转录:以RNA为模板通过逆转录合成cDNA第一链,使用基于靶向基因的保留内含子下游外显子设计的基因特异性引物群1,作为逆转录引物,并在5’端添加通用测序引物序列(seq2),加入普通dNTP和一定浓度的3’修饰dNTP进行逆转录反应,其中,所述3’修饰的dNTP为AzNTP(3'-叠氮基-2',3'dNTP),其能在链延伸时能够阻止添加下一个dNTP结合,同时所述3’修饰基团应可用于下游接头进行化学连接。逆转录过程中,在逆转录酶的作用下,以RNA为模板在引物3’末端添加dNTPs合成单链cDNA。当3’修饰dNTP替代dNTPs添加至cDNA单链上时,链延伸终止,完成cDNA第一链合成。1) Reverse transcription: The first strand of cDNA is synthesized by reverse transcription using RNA as a template, using a gene-specific primer group 1 designed based on the downstream exons of the retained introns of the targeted gene as a reverse transcription primer, and adding a universal sequencing primer sequence (seq2) at the 5' end, adding ordinary dNTP and a certain concentration of 3' modified dNTP for reverse transcription reaction, wherein the 3' modified dNTP is AzNTP (3'-azido-2', 3'dNTP), which can prevent the addition of the next dNTP from binding during chain extension, and the 3' modified group should be used for chemical connection of the downstream linker. During the reverse transcription process, under the action of reverse transcriptase, dNTPs are added to the 3' end of the primer using RNA as a template to synthesize single-stranded cDNA. When the 3' modified dNTP replaces the dNTPs and is added to the cDNA single strand, the chain extension is terminated, and the synthesis of the first strand of cDNA is completed.
2)点击化学连接:将上述获得的cDNA与5’端含有炔基修饰的寡核苷酸(oligonucleotides)进行点击化学反应(Click Reaction,即CuAAC点击反应——一价铜离子催化叠氮化物-炔烃环加成反应)。其中,寡核苷酸接头序列包含5’N5+NGS通用测序引物(seq1)互补区段序列,作为后续测序建库中PCR引物结合位点。N5为5个随机碱基组成序列,用于保证测序文库分子多样性以及在结果分析中去除PCR扩增过程中的系统性偏差和测序错误(sequencing error)。随机碱基序列的长度可调整为N4、N5、N6、N7、N8、N9、N10、N11、N12、N13、N14、N15、N16。2) Click chemistry connection: The cDNA obtained above is subjected to a click chemistry reaction (i.e., CuAAC click reaction—monovalent copper ion-catalyzed azide-alkyne cycloaddition reaction) with an oligonucleotide modified with an alkyne group at the 5’ end. The oligonucleotide adapter sequence contains a complementary segment sequence of 5’N5+NGS universal sequencing primer (seq1), which serves as a PCR primer binding site in the subsequent sequencing library construction. N5 is a sequence composed of 5 random bases, which is used to ensure the molecular diversity of the sequencing library and to remove systematic deviations and sequencing errors in the PCR amplification process in the result analysis. The length of the random base sequence can be adjusted to N4, N5, N6, N7, N8, N9, N10, N11, N12, N13, N14, N15, and N16.
3)靶向富集:以点击化学反应产物为模板,通过PCR反应靶向富集目标基因片段。PCR体系中的引物包括一个通用测序引物1(seq1,其3’端序列与上述炔基引物中3’端互补)和基因特异性引物群2。引物群2基于每个靶向基因的保留内含子下游的外显子而设计,但应比靶
向逆转录引物群1位置向上游上移5-100个碱基,甚至可以部分重叠,但不能完全一致,并在引物5’端添加通用测序引物2序列(seq2)。CLICK产物中含有叠氮化物-炔烃连接键,因此,DNA聚合酶需使用Taq或其他可扩增该种模板的聚合酶。3) Targeted enrichment: Using the click chemistry reaction product as a template, the target gene fragment is targeted and enriched through PCR reaction. The primers in the PCR system include a universal sequencing primer 1 (seq1, whose 3' end sequence is complementary to the 3' end of the above-mentioned alkyne primer) and a gene-specific primer group 2. Primer group 2 is designed based on the exons downstream of the retained introns of each targeted gene, but should be smaller than the target Move 5-100 bases upstream of the reverse transcription primer group 1 position, and even partially overlap, but not completely match, and add the universal sequencing primer 2 sequence (seq2) to the 5' end of the primer. The CLICK product contains an azide-alkyne linker, so the DNA polymerase needs to use Taq or other polymerases that can amplify this template.
4)测序接头连接及富集:以上述富集片段为模板,进行PCR反应为富集片段添加完整的P5和P7接头,用于后续NGS测序。并加入核酸条形码(sample barcode)。核酸条形码可实现高通量测序过程中的混样,提高检测通量、降低样本测序过程中的仪器误差。4) Sequencing adapter connection and enrichment: Using the above enriched fragments as templates, PCR reaction is performed to add complete P5 and P7 adapters to the enriched fragments for subsequent NGS sequencing. A nucleic acid barcode (sample barcode) is added. Nucleic acid barcodes can achieve mixed samples during high-throughput sequencing, improve detection throughput, and reduce instrument errors during sample sequencing.
在一些实施方案中,所述构建后的高通量测序文库中的片段均包括以下元件:P5序列、样本标签5、通用测序引物1(seq1)、插入DNA片段、通用测序引物2(seq2)、样本标签7、P7接头。In some embodiments, the fragments in the constructed high-throughput sequencing library all include the following elements: P5 sequence, sample tag 5, universal sequencing primer 1 (seq1), inserted DNA fragment, universal sequencing primer 2 (seq2), sample tag 7, and P7 adapter.
在一些实施方案中,所述P5序列和P7的核苷酸序列分别如SEQ ID NO:1(AATGATACGGCGACCACCGAGATCTACAC)和SEQ ID NO:2(CAAGCAGAAGACGGCATACGAGAT)所示。In some embodiments, the nucleotide sequences of the P5 sequence and P7 are shown as SEQ ID NO:1 (AATGATACGGCGACCACCGAGATCTACAC) and SEQ ID NO:2 (CAAGCAGAAGACGGCATACGAGAT), respectively.
在一些实施方案中,所述通用测序引物包括两种类型,PE接头和Nextera接头。In some embodiments, the universal sequencing primer includes two types, PE adapter and Nextera adapter.
在一些实施方案中,所述通用测序引物1和2选自以下表1中序列的任意组合。In some embodiments, the universal sequencing primers 1 and 2 are selected from any combination of the sequences in Table 1 below.
表1通用测序引物的序列信息
Table 1 Sequence information of universal sequencing primers
Table 1 Sequence information of universal sequencing primers
在一些实施方案中,所述寡核苷酸Hex_N5_Seq1rc的核苷酸序列如SEQ ID NO:9所示(NNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT)。In some embodiments, the nucleotide sequence of the oligonucleotide Hex_N5_Seq1rc is as shown in SEQ ID NO:9 (NNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT).
在一些实施方案中,所述样本标签5和样本标签7选自下表2中序列的任意组合。In some embodiments, the sample tag 5 and the sample tag 7 are selected from any combination of the sequences in Table 2 below.
表2序列信息
Table 2 Sequence information
Table 2 Sequence information
在一些实施方案中,所述完整序列的P5和P7接头的序列信息如表3所示。In some embodiments, the sequence information of the P5 and P7 linkers of the complete sequence is shown in Table 3.
表3序列信息
Table 3 Sequence information
Table 3 Sequence information
实施例Example
下面通过具体实施的方式来进一步说明本公开的技术方案。本领域技术人员应该明了,所
述实施例仅仅是帮助理解本公开,不应视为对本公开的具体限制。The technical solution of the present disclosure is further described below by way of specific implementation. The embodiments described above are only to help understand the present disclosure and should not be considered as specific limitations of the present disclosure.
实施例1:检测剪接异构体的靶向高通量测序平台的构建Example 1: Construction of a targeted high-throughput sequencing platform for detecting splicing isoforms
1、RNA提取1. RNA extraction
用商业化试剂盒对目标细胞的RNA进行提取,例如济凡生物科技(北京)有限公司的FineProtect通用型RNA提取试剂盒。提取过程中需保证RNA的提取质量,以进行后续试验。Use a commercial kit to extract RNA from target cells, such as the FineProtect Universal RNA Extraction Kit from Jifan Biotechnology (Beijing) Co., Ltd. The quality of RNA extraction must be guaranteed during the extraction process for subsequent experiments.
2、RNA定量2. RNA Quantification
采用商业化试剂盒对RNA进行定量,并利用荧光染料技术特异性的检测RNA浓度。例如Invitrogen公司的QubitTM RNA高灵敏度(HS)定量试剂盒。RNA is quantified using a commercial kit, and the RNA concentration is specifically detected using fluorescent dye technology, such as Invitrogen's Qubit ™ RNA High Sensitivity (HS) Quantification Kit.
3、逆转录3. Reverse transcription
根据上述测定的定量浓度,选择相同质量的RNA作为合成cDNA第一链的模板,以随机六聚体引物random hexamer作为逆转录引物,在逆转录体系中加入普通dNTP和1/20摩尔浓度比例的3’修饰的dNTP进行逆转录反应。According to the quantitative concentration determined above, select RNA of the same mass as the template for synthesizing the first chain of cDNA, use random hexamer as the reverse transcription primer, and add ordinary dNTP and 3' modified dNTP at a molar concentration ratio of 1/20 to the reverse transcription system for reverse transcription reaction.
其中逆转录反应体系组分见表4,逆转录反应条件见表5。The components of the reverse transcription reaction system are shown in Table 4, and the reverse transcription reaction conditions are shown in Table 5.
表4:转录反应体系(20μL/反应)
Table 4: Transcription reaction system (20 μL/reaction)
Table 4: Transcription reaction system (20 μL/reaction)
表5:逆转录反应条件
Table 5: Reverse transcription reaction conditions
Table 5: Reverse transcription reaction conditions
4、cDNA纯化4. cDNA Purification
完成First-strand cDNA合成后的产物采用商业化试剂盒进行DNA纯化。例如Zymo公司
的DNA Clean&Concentrator-5试剂盒。纯化产物使用纯水洗脱至10μL。After the First-strand cDNA synthesis is completed, the product is purified using a commercial kit. For example, Zymo The purified product was eluted with pure water to 10 μL.
5、Click ligation(点击化学连接)5. Click ligation
以上述纯化后的cDNA与5’端含有炔基修饰的寡核苷酸(oligonucleotides)在催化剂Copper(II)-TBTA complex存在的条件下,室温进行Click Reaction(点击化学反应,即CuAAC点击反应——一价铜离子催化叠氮化物-炔烃环加成反应)1小时。The purified cDNA and oligonucleotides containing alkyne modification at the 5' end were subjected to a click reaction (i.e., CuAAC click reaction - monovalent copper ion-catalyzed azide-alkyne cycloaddition reaction) in the presence of the catalyst Copper(II)-TBTA complex at room temperature for 1 hour.
其中点击化学反应体系组分见表6,寡核苷酸(oligonucleotides)的核酸序列见SEQ ID NO:9所示。The components of the click chemistry reaction system are shown in Table 6, and the nucleic acid sequence of oligonucleotides is shown in SEQ ID NO:9.
表6:点击化学反应体系(37.8μL/反应)
Table 6: Click chemistry reaction system (37.8 μL/reaction)
Table 6: Click chemistry reaction system (37.8 μL/reaction)
6、Click ligation产物纯化6. Click ligation product purification
点击化学反应产物采用商业化试剂盒进行纯化。例如Zymo公司的DNA Clean&Concentrator-5试剂盒。纯化产物使用纯水洗脱至10μL。The click chemistry reaction product is purified using a commercial kit, such as the DNA Clean&Concentrator-5 kit from Zymo. The purified product is eluted with pure water to 10 μL.
7、靶向富集7. Targeted Enrichment
以上述纯化后的产物为模板,以靶向基因各保留内含子下游外显子设计的引物作为靶向基因特异性引物组合与一个通用测序引物(PE1_p26)进行PCR反应。The purified product was used as a template, and the primers designed for the downstream exons of each retained intron of the targeted gene were used as the targeted gene-specific primer combination and a universal sequencing primer (PE1_p26) for PCR reaction.
其中PCR反应体系组分见表7,引物组合混合液的配制见表8,PE1_p26引物的核酸序列如SEQ ID NO:6所示,qPCR反应条件见表9。The components of the PCR reaction system are shown in Table 7, the preparation of the primer combination mixture is shown in Table 8, the nucleic acid sequence of the PE1_p26 primer is shown in SEQ ID NO: 6, and the qPCR reaction conditions are shown in Table 9.
表7:PCR反应体系(25μL/反应)
Table 7: PCR reaction system (25 μL/reaction)
Table 7: PCR reaction system (25 μL/reaction)
表8:引物组合混合液组分表
Table 8: Primer combination mixture composition table
Table 8: Primer combination mixture composition table
表9:PCR反应条件
Table 9: PCR reaction conditions
Table 9: PCR reaction conditions
8、靶向富集产物纯化8. Purification of targeted enrichment products
靶向富集PCR产物采用商业化试剂盒进行纯化。例如Zymo公司的DNA Clean&Concentrator-5试剂盒。纯化产物使用纯水洗脱至10μL。The targeted enrichment PCR product was purified using a commercial kit, such as the DNA Clean & Concentrator-5 kit from Zymo. The purified product was eluted with pure water to 10 μL.
9、测序接头连接及富集9. Sequencing adapter ligation and enrichment
以上述获得的靶向富集片段为模板,以完整的P5和P7接头为引物,进行PCR反应,以便用于后续的NGS测序。其中,P5接头可与click反应产物中含有Seq 1互补区段序列的序列进行互补,P7接头可与PCR反应过程中产生的含有Seq 2互补序列进行互补。在单一Add-on PCR引物或两个Add-on PCR引物中加入核酸条形码(sample barcode),实现高通量测序过程中的混样。The above-obtained targeted enriched fragments are used as templates and the complete P5 and P7 adapters are used as primers to perform PCR reactions for subsequent NGS sequencing. The P5 adapter can complement the sequence containing the Seq 1 complementary segment sequence in the click reaction product, and the P7 adapter can complement the sequence containing the Seq 2 complementary sequence generated during the PCR reaction. A nucleic acid barcode (sample barcode) is added to a single Add-on PCR primer or two Add-on PCR primers to achieve mixed samples during high-throughput sequencing.
其中PCR反应体系组分见表10,完整序列的P5、P7接头引物的核酸序列见表3,qPCR反应条件见表11。The components of the PCR reaction system are shown in Table 10, the nucleic acid sequences of the complete sequence of the P5 and P7 adapter primers are shown in Table 3, and the qPCR reaction conditions are shown in Table 11.
表10:PCR反应体系(25μL/反应)
Table 10: PCR reaction system (25 μL/reaction)
Table 10: PCR reaction system (25 μL/reaction)
表11:PCR反应条件
Table 11: PCR reaction conditions
Table 11: PCR reaction conditions
10.高通量测序及结果分析10. High-throughput sequencing and result analysis
基于上述步骤扩增得到的测序文库,高通量测序获得所述细胞样本中目标基因的测序信息,包括但不限于多重靶向剪接异构体的分析、剪接异构体的精准定量、未知的剪接异构体和反式剪接脱靶事件评估等。Based on the sequencing library amplified in the above steps, high-throughput sequencing is performed to obtain sequencing information of the target gene in the cell sample, including but not limited to analysis of multiple targeted splicing isoforms, precise quantification of splicing isoforms, evaluation of unknown splicing isoforms and trans-splicing off-target events, etc.
实施例2:MCF10A、MCF7、MDA-MB-231细胞系中剪接异构体检测Example 2: Detection of splicing isoforms in MCF10A, MCF7, and MDA-MB-231 cell lines
利用实施例1所述的高通量测序平台检测MCF10A、MCF7、MDA-MB-231细胞系中的剪接异构体。The high-throughput sequencing platform described in Example 1 was used to detect splicing isoforms in MCF10A, MCF7, and MDA-MB-231 cell lines.
其中,步骤1-6同实施例1中的操作。Wherein, steps 1-6 are the same as those in Example 1.
步骤7的靶向富集过程中,以ATP13A1基因、CXXC1基因、ECHDC2基因、FGFRL1基因、HMGN3基因、KLHL17基因、OSGEP基因各保留内含子下游外显子设计的引物作为靶向富集引物组合和通用测序引物PE1_p26进行PCR反应。引物组合混合液的组分表见表12,各靶向基因的引物序列见表13。In the targeted enrichment process of step 7, primers designed for the downstream exons of the retained introns of the ATP13A1 gene, CXXC1 gene, ECHDC2 gene, FGFRL1 gene, HMGN3 gene, KLHL17 gene, and OSGEP gene were used as targeted enrichment primer combinations and universal sequencing primer PE1_p26 for PCR reaction. The components of the primer combination mixture are shown in Table 12, and the primer sequences of each targeted gene are shown in Table 13.
表12:引物组合混合液组分表
Table 12: Primer combination mixture composition table
Table 12: Primer combination mixture composition table
表13:引物的核酸序列
Table 13: Nucleic acid sequences of primers
Table 13: Nucleic acid sequences of primers
步骤8-10同实施例1中的操作。Steps 8-10 are the same as those in Example 1.
高通量测序结果显示MCF10A、MCF7、MDA-MB-231细胞系目标基因均可见正常剪接体与剪接异构体(内含子保留),并成功定量检测靶向基因中剪接异构体的比例,具体试验结果见表14。以上数据说明,本公开构建的靶向高通量测序平台可有效实现对剪接异构体的检测。The high-throughput sequencing results showed that the target genes of MCF10A, MCF7, and MDA-MB-231 cell lines all had normal spliceosomes and splice isoforms (intron retention), and the ratio of splice isoforms in the targeted genes was successfully quantitatively detected. The specific test results are shown in Table 14. The above data show that the targeted high-throughput sequencing platform constructed in the present disclosure can effectively realize the detection of splice isoforms.
表14:MCF10A、MCF7、MDA-MB-231细胞系目标基因的检测结果
Table 14: Detection results of target genes in MCF10A, MCF7, and MDA-MB-231 cell lines
Table 14: Detection results of target genes in MCF10A, MCF7, and MDA-MB-231 cell lines
实施例3:体外转录(In-vitro Transcription,IVT)RNA模拟不同比例的剪接异构体检测Example 3: In-vitro Transcription (IVT) RNA simulation detection of different ratios of splicing isoforms
制备体外RNA合成样本(IVT产物),包括两种剪接异构体序列,即正常剪接体与剪接异构体(内含子保留)。将IVT产物模拟不同比例的剪接异构体混合,使用本公开构建的靶向高
通量测序平台检测,并计算混合样本实际检出值和混合样本理论值相关性。具体实验细节如下:Prepare an in vitro RNA synthesis sample (IVT product), including two splice isoform sequences, namely, a normal splice isoform and a splice isoform (intron retention). Mix the splice isoforms of the IVT product in different proportions, and use the targeted high The throughput sequencing platform detects and calculates the correlation between the actual detection value of the mixed sample and the theoretical value of the mixed sample. The specific experimental details are as follows:
步骤1、体外转录获得目的RNA合成样本Step 1: Obtain target RNA synthesis sample by in vitro transcription
已合成好的正常剪接体与剪接异构体(内含子保留)DNA使用商业化试剂盒进行体外转录反应,如诺唯赞T7 High Yield RNA Transcription Kit(TR101)。The synthesized normal spliceosome and splice isoform (intron retention) DNA is subjected to in vitro transcription reaction using commercial kits, such as Novazonic T7 High Yield RNA Transcription Kit (TR101).
体外转录反应体系见表15。The in vitro transcription reaction system is shown in Table 15.
表15:体外转录体系(20μL体系)
Table 15: In vitro transcription system (20 μL system)
Table 15: In vitro transcription system (20 μL system)
上述体系37℃孵育2小时后,加入1μL DNase I,继续37℃孵育15min,以去除DNA模板。After incubating the above system at 37°C for 2 hours, add 1 μL DNase I and continue incubating at 37°C for 15 minutes to remove the DNA template.
步骤2、RNA回收纯化Step 2: RNA recovery and purification
体外转录合成的RNA,使用苯酚-乙醇沉淀法进行回收。将最终沉淀RNA产物使用双蒸水进行溶解,使用Nano-Drop测定RNA浓度,并将RNA稀释至2μM,0.2μM,0.02μM,0.002μM后备用。The RNA synthesized by in vitro transcription was recovered by phenol-ethanol precipitation. The final precipitated RNA product was dissolved in double distilled water, the RNA concentration was determined by Nano-Drop, and the RNA was diluted to 2 μM, 0.2 μM, 0.02 μM, and 0.002 μM for later use.
步骤3、混合不同比例的剪接异构体IVT产物Step 3: Mixing different ratios of splice isoform IVT products
如表16以1:1的体积比例混合IVT产物。The IVT products were mixed in a 1:1 volume ratio as shown in Table 16.
表16:IVT产物混样
Table 16: IVT product mix
Table 16: IVT product mix
步骤4、逆转录Step 4: Reverse transcription
将上述混合好的IVT产物,分别稀释100倍后作为合成cDNA第一链的模板,以目的内含子下游外显子上反向引物作为逆转录引物,在逆转录体系中加入普通dNTP和1/15摩尔浓度比例的3’修饰的dNTP进行逆转录反应。The mixed IVT products were diluted 100 times and used as templates for synthesizing the first chain of cDNA. The reverse primer on the downstream exon of the target intron was used as the reverse transcription primer. Ordinary dNTP and 3' modified dNTP at a molar concentration of 1/15 were added to the reverse transcription system for reverse transcription reaction.
其中逆转录反应体系组分见表17,逆转录反应条件见表18。The components of the reverse transcription reaction system are shown in Table 17, and the reverse transcription reaction conditions are shown in Table 18.
表17:转录反应体系(20μL/反应)
Table 17: Transcription reaction system (20 μL/reaction)
Table 17: Transcription reaction system (20 μL/reaction)
表18:逆转录反应条件
Table 18: Reverse transcription reaction conditions
Table 18: Reverse transcription reaction conditions
其中,步骤5-6同实施例1中的4-6操作。Among them, steps 5-6 are performed the same as steps 4-6 in Example 1.
步骤8、靶向富集Step 8: Targeted Enrichment
以上述纯化后的产物为模板,以目的内含子下游外显子设计的引物作为靶向基因特异性引物组合与一个通用测序引物(PE1_p26)进行PCR反应。The purified product was used as a template, and the primers designed for the downstream exons of the target intron were used as a combination of targeted gene-specific primers and a universal sequencing primer (PE1_p26) for PCR reaction.
其中PCR反应体系组分见表19,PE1_p26引物的核酸序列如SEQ ID NO:6所示,qPCR反应条件见表20。引物序列见表21。The components of the PCR reaction system are shown in Table 19, the nucleic acid sequence of the PE1_p26 primer is shown in SEQ ID NO: 6, and the qPCR reaction conditions are shown in Table 20. The primer sequences are shown in Table 21.
表19:PCR反应体系(25μL/反应)
Table 19: PCR reaction system (25 μL/reaction)
Table 19: PCR reaction system (25 μL/reaction)
表20:PCR反应条件
Table 20: PCR reaction conditions
Table 20: PCR reaction conditions
表21:引物的核酸序列
Table 21: Nucleic acid sequences of primers
Table 21: Nucleic acid sequences of primers
其中,步骤9-10同实施例1中的8-9操作。Among them, steps 9-10 are performed the same as steps 8-9 in Example 1.
高通量测序结果显示不同混合比例的剪接异构体IVT产物实际检出值和混合样本理论值相关性r>0.99,具体试验结果见表22和图5。The high-throughput sequencing results showed that the correlation between the actual detection values of splicing isoform IVT products with different mixing ratios and the theoretical values of mixed samples was r>0.99. The specific experimental results are shown in Table 22 and Figure 5.
表22:RNA模拟不同比例的剪接异构体的检测结果
Table 22: Detection results of RNA simulations with different ratios of splice isoforms
Table 22: Detection results of RNA simulations with different ratios of splice isoforms
实施例4:HeLa细胞系中剪接异构体检测Example 4: Detection of splicing isoforms in HeLa cell lines
利用实施例1所述的高通量测序平台检测HeLa细胞系中的剪接异构体。The high-throughput sequencing platform described in Example 1 was used to detect splicing isoforms in the HeLa cell line.
其中,步骤1-2同实施例1中的操作。Wherein, steps 1-2 are the same as those in Example 1.
步骤3的靶向富集逆转录过程中,以基因特异性引物作为逆转录引物,在逆转录体系中加入普通dNTP和1/15摩尔浓度比例的3’修饰的dNTP进行逆转录反应。In the targeted enrichment reverse transcription process of step 3, gene-specific primers are used as reverse transcription primers, and common dNTPs and 3'-modified dNTPs at a molar concentration ratio of 1/15 are added to the reverse transcription system for reverse transcription reaction.
其中基因特异性引物序列见表23,基因特异性引组合混合液配制见表24,逆转录反应体系组分见表25,逆转录反应条件见表26。The gene-specific primer sequences are shown in Table 23, the preparation of the gene-specific primer combination mixture is shown in Table 24, the components of the reverse transcription reaction system are shown in Table 25, and the reverse transcription reaction conditions are shown in Table 26.
表23:逆转录反应基因特异性引物序列
Table 23: Reverse transcription reaction gene specific primer sequences
Table 23: Reverse transcription reaction gene specific primer sequences
表24:基因特异性引物混合物配制
Table 24: Gene-specific primer mix preparation
Table 24: Gene-specific primer mix preparation
表25:转录反应体系(20μL/反应)
Table 25: Transcription reaction system (20 μL/reaction)
Table 25: Transcription reaction system (20 μL/reaction)
表26:逆转录反应条件
Table 26: Reverse transcription reaction conditions
Table 26: Reverse transcription reaction conditions
其中,步骤4-6同实施例1中的操作。Wherein, steps 4-6 are the same as those in Example 1.
步骤7的靶向富集过程中,以ATP13A1基因、CXXC1基因、ECHDC2基因、FGFRL1基因、HMGN3基因、KLHL17基因、OSGEP基因、NAXD基因、LZTR1基因、SELENBP1基
因、JMJD8基因各保留内含子下游外显子设计的引物作为靶向富集引物组合和通用测序引物PE1_p26进行PCR反应。相较于逆转录中的基因特异性引物,靶向富集PCR的引物更加靠近内含子约20-50bp。其中各靶向基因的引物序列见表27,引物混合液配制见表28,PCR反应体系组分见表29,PE1_p26引物的核酸序列如SEQ ID NO:6所示,PCR反应条件见表30。In the targeted enrichment process of step 7, ATP13A1 gene, CXXC1 gene, ECHDC2 gene, FGFRL1 gene, HMGN3 gene, KLHL17 gene, OSGEP gene, NAXD gene, LZTR1 gene, SELENBP1 gene The primers designed for the downstream exons of the JMJD8 and JMJD9 genes were used as the targeted enrichment primer combination and the universal sequencing primer PE1_p26 for PCR reaction. Compared with the gene-specific primers in reverse transcription, the primers for targeted enrichment PCR are closer to the intron by about 20-50bp. The primer sequences of each targeted gene are shown in Table 27, the preparation of the primer mixture is shown in Table 28, the components of the PCR reaction system are shown in Table 29, the nucleic acid sequence of the PE1_p26 primer is shown in SEQ ID NO: 6, and the PCR reaction conditions are shown in Table 30.
表27:引物组合混合液配制
Table 27: Primer combination mixture preparation
Table 27: Primer combination mixture preparation
表28:引物混合物配制
Table 28: Primer mix preparation
Table 28: Primer mix preparation
表29:PCR反应体系(25μL/反应)
Table 29: PCR reaction system (25 μL/reaction)
Table 29: PCR reaction system (25 μL/reaction)
表30:PCR反应条件
Table 30: PCR reaction conditions
Table 30: PCR reaction conditions
步骤8-10同实施例1中的操作。Steps 8-10 are the same as those in Example 1.
最终产物进行高通量测序,高通量测序结果显示HeLa细胞系目标基因均可见正常剪接体与剪接异构体(内含子保留),并成功定量检测靶向基因中剪接异构体的比例,具体试验结果见表31。以上数据说明,本公开构建的靶向高通量测序平台可有效实现对剪接异构体的检测。The final product was subjected to high-throughput sequencing, and the results of high-throughput sequencing showed that normal spliceosomes and splice isoforms (intron retention) were visible in the target genes of the HeLa cell line, and the ratio of splice isoforms in the targeted genes was successfully quantitatively detected. The specific test results are shown in Table 31. The above data show that the targeted high-throughput sequencing platform constructed in the present disclosure can effectively realize the detection of splice isoforms.
表31:HeLa细胞系目标基因的检测结果
Table 31: Detection results of target genes in HeLa cell line
Table 31: Detection results of target genes in HeLa cell line
实施例5:反式剪接(trans-splicing)脱靶事件评估Example 5: Evaluation of trans-splicing off-target events
利用实施例1所述的高通量测序平台检测HEK293T细胞转染迷你基因(mini-Gene)与反式剪接因子(Pre-mRNA Trans-splicing Molecule)后发生目标(on-target)反式剪接和脱靶(off-target)反式剪接事件。The high-throughput sequencing platform described in Example 1 was used to detect on-target trans-splicing and off-target trans-splicing events that occurred after HEK293T cells were transfected with mini-genes and trans-splicing factors (Pre-mRNA Trans-splicing Molecule).
其中,步骤1-2同实施例1中的操作。Wherein, steps 1-2 are the same as those in Example 1.
步骤3的靶向富集逆转录过程中,以反式剪接分子、迷你基因特异性引物作为逆转录引物,在逆转录体系中加入普通dNTP和1/15摩尔浓度比例的3’修饰的dNTP进行逆转录反应。In the targeted enrichment reverse transcription process of step 3, the reverse splicing molecule and the mini-gene specific primer are used as reverse transcription primers, and ordinary dNTP and 3' modified dNTP at a molar concentration ratio of 1/15 are added to the reverse transcription system for reverse transcription reaction.
其中基因特异性引物序列见表32,基因特异性引组合混合液配制见表33,逆转录反应体系组分见表34,逆转录反应条件见表35。The gene-specific primer sequences are shown in Table 32, the preparation of the gene-specific primer combination mixture is shown in Table 33, the components of the reverse transcription reaction system are shown in Table 34, and the reverse transcription reaction conditions are shown in Table 35.
表32:逆转录反应基因特异性引物序列
Table 32: Reverse transcription reaction gene specific primer sequences
Table 32: Reverse transcription reaction gene specific primer sequences
表33:基因特异性引物混合物配制
Table 33: Gene-specific primer mix preparation
Table 33: Gene-specific primer mix preparation
表34:转录反应体系(20μL/反应)
Table 34: Transcription reaction system (20 μL/reaction)
Table 34: Transcription reaction system (20 μL/reaction)
表35:逆转录反应条件
Table 35: Reverse transcription reaction conditions
Table 35: Reverse transcription reaction conditions
其中,步骤4-6同实施例1中的操作。Wherein, steps 4-6 are the same as those in Example 1.
步骤7的靶向富集过程中,以反式剪接分子、迷你基因特异性引物作为靶向富集引物组合和通用测序引物PE1_p26进行PCR反应。相较于逆转录中的特异性引物,反式剪接分子、迷你基因靶向富集PCR的引物分别更加靠近3’剪接位点(3’splicing site)和5’剪接位点(5’splicing site)约20-50bp。其中各靶向基因的引物序列见表36,引物混合液配制见表37,PCR反应体系组分见表38,PE1_p26引物的核酸序列如SEQ ID NO:6所示,PCR反应条件见表39。In the targeted enrichment process of step 7, the trans-splicing molecule and mini-gene specific primers are used as the targeted enrichment primer combination and the universal sequencing primer PE1_p26 for PCR reaction. Compared with the specific primers in reverse transcription, the primers of trans-splicing molecules and mini-gene targeted enrichment PCR are closer to the 3'splicing site (3'splicing site) and 5'splicing site (5'splicing site) by about 20-50bp. The primer sequences of each targeted gene are shown in Table 36, the preparation of the primer mixture is shown in Table 37, the components of the PCR reaction system are shown in Table 38, the nucleic acid sequence of the PE1_p26 primer is shown in SEQ ID NO:6, and the PCR reaction conditions are shown in Table 39.
表36:引物组合混合液配制
Table 36: Primer combination mixture preparation
Table 36: Primer combination mixture preparation
表37:引物混合物配制
Table 37: Primer mix preparation
Table 37: Primer mix preparation
表38:PCR反应体系(25μL/反应)
Table 38: PCR reaction system (25 μL/reaction)
Table 38: PCR reaction system (25 μL/reaction)
表39:PCR反应条件
Table 39: PCR reaction conditions
Table 39: PCR reaction conditions
步骤8-10同实施例1中的操作。Steps 8-10 are the same as those in Example 1.
最终产物进行高通量测序,高通量测序结果显示HEK293T细胞转染迷你基因(mini-Gene)与反式剪接因子(Pre-mRNA Trans-splicing Molecule)后可见目标反式剪接产物与脱靶反式剪接见产物。同时,精准确定反式剪接因子发生脱靶反式剪接所在基因组位置(事件较多,未全部列举),具体试验结果见表40。以上数据说明,本公开构建的靶向高通量测序平台可有效实现对反式剪接脱靶事件评估,并对目标反式剪接和脱靶反式剪接同时进行定性定量分析。The final product was subjected to high-throughput sequencing, and the results of high-throughput sequencing showed that after HEK293T cells were transfected with mini-genes (mini-Gene) and trans-splicing factors (Pre-mRNA Trans-splicing Molecule), target trans-splicing products and off-target trans-splicing products were visible. At the same time, the genomic location where off-target trans-splicing of trans-splicing factors occurred was accurately determined (there were many events, not all of which were listed), and the specific test results are shown in Table 40. The above data show that the targeted high-throughput sequencing platform constructed in the present disclosure can effectively realize the evaluation of trans-splicing off-target events, and simultaneously perform qualitative and quantitative analysis of target trans-splicing and off-target trans-splicing.
表40:HEK293T细胞反式剪接和脱靶反式剪接的检测结果
Table 40: Detection results of trans-splicing and off-target trans-splicing in HEK293T cells
Table 40: Detection results of trans-splicing and off-target trans-splicing in HEK293T cells
以上所述实施例仅表达了本公开的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本公开构思的前提下,还可以做出若干变形和改进,这些都属于本公开的保护范围。因此,本公开专利的保护范围应以所附权利要求为准。
The above-described embodiments only express several implementation methods of the present disclosure, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the scope of the invention patent. It should be pointed out that, for ordinary technicians in the field, several modifications and improvements can be made without departing from the concept of the present disclosure, and these all belong to the protection scope of the present disclosure. Therefore, the protection scope of the patent of the present disclosure shall be subject to the attached claims.
Claims (18)
- 一种用于高通量测序的测序文库的建立方法,其包括以下步骤:A method for establishing a sequencing library for high-throughput sequencing comprises the following steps:(1)采用逆转录引物对样本RNA进行逆转录,加入普通dNTP和3’修饰的dNTP,反应得到cDNA第一链,所述3’修饰的dNTP选自以下的一种或多种:AzNTP、AmNTP、propargyl-NTP、HalNTP;(1) reverse transcription of sample RNA using a reverse transcription primer, adding common dNTP and 3'-modified dNTP to obtain a first cDNA chain, wherein the 3'-modified dNTP is selected from one or more of the following: AzNTP, AmNTP, propargyl-NTP, HalNTP;(2)通过点击化学反应将5’端带有炔基修饰的寡核苷酸片段与步骤(1)获得的cDNA片段连接,所述寡核苷酸片段包含随机序列和通用测序引物1(seq1)的互补区段序列;(2) connecting the oligonucleotide fragment with an alkyne modification at the 5′ end to the cDNA fragment obtained in step (1) by a click chemistry reaction, wherein the oligonucleotide fragment comprises a random sequence and a complementary segment sequence of universal sequencing primer 1 (seq1);(3)以步骤(2)的反应产物为模板,进行PCR反应扩增;(3) using the reaction product of step (2) as a template, performing PCR amplification;(4)获得测序文库。(4) Obtain a sequencing library.
- 根据权利要求1所述的方法,其中,步骤(1)中所述的逆转录引物是随机引物或基于靶向基因的保留内含子下游外显子设计的基因特异性引物群1;The method according to claim 1, wherein the reverse transcription primers in step (1) are random primers or gene-specific primer group 1 designed based on the downstream exons of the retained introns of the targeted gene;任选地,所述基因特异性引物群1中的引物5’端带有通用测序引物2序列(seq2)。Optionally, the 5' end of the primer in the gene-specific primer group 1 carries a universal sequencing primer 2 sequence (seq2).
- 根据权利要求1或2所述的方法,所述步骤(3)采用的PCR扩增引物包括通用测序引物1和基因特异性引物群2,其基因特异性引物群2中的引物的5’端带有通用测序引物2序列。According to the method of claim 1 or 2, the PCR amplification primers used in step (3) include universal sequencing primer 1 and gene-specific primer group 2, and the 5' end of the primer in the gene-specific primer group 2 carries the universal sequencing primer 2 sequence.
- 根据权利要求2或3所述的方法,基因特异性引物群1和2均是基于特定靶向基因的保留内含子选择性剪接事件下游的外显子设计,其中,基因特异性引物2的靶向位点比基因特异性引物1的位点向上游上移5-100个碱基,优选为20-50个碱基。According to the method of claim 2 or 3, both gene-specific primer groups 1 and 2 are designed based on the exons downstream of the selective splicing event of the retained intron of the specific targeted gene, wherein the targeting site of gene-specific primer 2 is shifted upstream by 5-100 bases, preferably 20-50 bases, than the site of gene-specific primer 1.
- 根据权利要求2-4任一项所述的方法,基因特异性引物群1和基因特异性引物群2分别靶向的目标基因的个数大于等于1。According to the method according to any one of claims 2 to 4, the number of target genes targeted by gene-specific primer group 1 and gene-specific primer group 2 respectively is greater than or equal to 1.
- 根据权利要求1-5任一项所述的方法,其中,步骤(1)中加入普通dNTP和3’修饰的dNTP的摩尔浓度比例为1:1-1:100。The method according to any one of claims 1 to 5, wherein the molar concentration ratio of the common dNTP and the 3'-modified dNTP added in step (1) is 1:1-1:100.
- 根据权利要求1-6任一项所述的方法,其中,步骤(1)中加入普通dNTP和3’修饰的dNTP的摩尔浓度比例为1:20。The method according to any one of claims 1 to 6, wherein the molar concentration ratio of the common dNTP and the 3'-modified dNTP added in step (1) is 1:20.
- 根据权利要求1-7任一项所述的方法,其中,步骤(2)中所述随机序列包含4-16个核苷酸。The method according to any one of claims 1 to 7, wherein the random sequence in step (2) comprises 4 to 16 nucleotides.
- 根据权利要求8任一项所述的方法,其中,步骤(2)中所述随机序列为5个核苷酸的长度。 The method according to any one of claim 8, wherein the random sequence in step (2) is 5 nucleotides in length.
- 根据权利要求1-9任一项所述的方法,其中步骤(3)的PCR扩增包括采用通用测序引物1和基因特异性引物群2对步骤(2)的反应产物进行PCR扩增后,再对扩增产物添加接头结构的步骤,所述接头结构包括P5/P7接头和核酸条形码,所述核酸条形码连接在引物P5和/或P7接头端。The method according to any one of claims 1 to 9, wherein the PCR amplification in step (3) comprises the step of performing PCR amplification on the reaction product of step (2) using a universal sequencing primer 1 and a gene-specific primer group 2, and then adding a linker structure to the amplified product, wherein the linker structure comprises a P5/P7 linker and a nucleic acid barcode, and the nucleic acid barcode is connected to the primer P5 and/or P7 linker end.
- 根据权利要求1所述的方法,所述步骤(3)为:以步骤(2)的反应产物为模板,添加P5和P7接头进行PCR反应。According to the method of claim 1, step (3) is: using the reaction product of step (2) as a template, adding P5 and P7 linkers to carry out PCR reaction.
- 根据权利要求1-11任一项所述的方法,其中,所述通用测序引物1序列选自SEQ ID NO:3、SEQ ID NO:5或SEQ ID NO:6中的任一项;优选地,所述通用测序引物1的互补区段序列如SEQ ID NO:4所示。The method according to any one of claims 1-11, wherein the universal sequencing primer 1 sequence is selected from any one of SEQ ID NO: 3, SEQ ID NO: 5 or SEQ ID NO: 6; preferably, the complementary segment sequence of the universal sequencing primer 1 is as shown in SEQ ID NO: 4.
- 根据权利要求1-11任一项所述的方法,其中,所述通用测序引物2序列(seq2)选自SEQ ID NO:7或SEQ ID NO:8中的任一项。The method according to any one of claims 1-11, wherein the universal sequencing primer 2 sequence (seq2) is selected from any one of SEQ ID NO: 7 or SEQ ID NO: 8.
- 根据权利要求1-13任一项所述的方法构建的高通量测序文库。A high-throughput sequencing library constructed according to the method according to any one of claims 1 to 13.
- 一种用于检测剪接异构体的靶向高通量测序方法,其包括以下步骤:A targeted high-throughput sequencing method for detecting splicing isoforms, comprising the following steps:(1)提取样本RNA,根据权利要求1-13中任一项所述的方法构建测序文库;(1) extracting sample RNA and constructing a sequencing library according to any one of claims 1 to 13;(2)基于上述测序文库,高通量测序获得所述样本中目标基因的测序信息。(2) Based on the above sequencing library, high-throughput sequencing is performed to obtain sequencing information of the target gene in the sample.
- 根据权利要求15所述的方法,其中,所述样本选自如下的细胞样本:MCF10A、MCF7、HeLa、HEK293T和/或MDA-MB-231。The method according to claim 15, wherein the sample is selected from the following cell samples: MCF10A, MCF7, HeLa, HEK293T and/or MDA-MB-231.
- 根据权利要求15或16所述的方法,其中,所述目标基因包括ATP13A1、CXXC1、ECHDC2、FGFRL1、HMGN3、KLHL17、NAXD、LZTR1、SELENBP1、JMJD8、PSMB1、HIGD2A、HNRNPAB、SMARCC1、ATP5IF1、HIGD2B、RPS21、UQCC5、NFATC3、PCNP和/或OSGEP。The method according to claim 15 or 16, wherein the target gene includes ATP13A1, CXXC1, ECHDC2, FGFRL1, HMGN3, KLHL17, NAXD, LZTR1, SELENBP1, JMJD8, PSMB1, HIGD2A, HNRNPAB, SMARCC1, ATP5IF1, HIGD2B, RPS21, UQCC5, NFATC3, PCNP and/or OSGEP.
- 权利要求1-13任一项所述的用于高通量测序的测序文库的建立方法、权利要求14所述的高通量测序文库和/或权利要求15-17所述的靶向高通量测序方法在脱靶事件评估中的用途;Use of the method for establishing a sequencing library for high-throughput sequencing according to any one of claims 1 to 13, the high-throughput sequencing library according to claim 14, and/or the targeted high-throughput sequencing method according to claims 15 to 17 in the assessment of off-target events;优选地,所述脱靶事件评估包括以下:Preferably, the off-target event assessment includes the following:(1)精准确定反式剪接因子中发生脱靶反式剪接所在的基因组位置;(1) Accurately determine the genomic location where off-target trans-splicing occurs in trans-splicing factors;(2)对目标反式剪接和脱靶反式剪接进行定量分析。 (2) Quantitative analysis of on-target trans-splicing and off-target trans-splicing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202480000814.7A CN118302538A (en) | 2023-04-27 | 2024-02-20 | Targeted high-throughput sequencing method for detecting splice isomers |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2023091367 | 2023-04-27 | ||
CNPCT/CN2023/091367 | 2023-04-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024222158A1 true WO2024222158A1 (en) | 2024-10-31 |
Family
ID=93255514
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/077716 WO2024222158A1 (en) | 2023-04-27 | 2024-02-20 | Targeted high-throughput sequencing method for detecting splicing isoform |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024222158A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190256547A1 (en) * | 2018-02-22 | 2019-08-22 | Board Of Regents, The University Of Texas System | Poly(A)-ClickSeq Click-Chemistry for Next Generation 3-End Sequencing Without RNA Enrichment or Fragmentation |
CN111979583A (en) * | 2020-09-10 | 2020-11-24 | 杭州求臻医学检验实验室有限公司 | Construction method and application of single-stranded nucleic acid molecule high-throughput sequencing library |
CN113355391A (en) * | 2021-06-04 | 2021-09-07 | 翌圣生物科技(上海)股份有限公司 | Method for establishing database by targeting FFPE RNA |
-
2024
- 2024-02-20 WO PCT/CN2024/077716 patent/WO2024222158A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190256547A1 (en) * | 2018-02-22 | 2019-08-22 | Board Of Regents, The University Of Texas System | Poly(A)-ClickSeq Click-Chemistry for Next Generation 3-End Sequencing Without RNA Enrichment or Fragmentation |
CN111979583A (en) * | 2020-09-10 | 2020-11-24 | 杭州求臻医学检验实验室有限公司 | Construction method and application of single-stranded nucleic acid molecule high-throughput sequencing library |
CN113355391A (en) * | 2021-06-04 | 2021-09-07 | 翌圣生物科技(上海)股份有限公司 | Method for establishing database by targeting FFPE RNA |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1922420B1 (en) | METHOD AND SUBSTANCES FOR ISOLATING miRNAs | |
US20230056763A1 (en) | Methods of targeted sequencing | |
JP2011500092A (en) | Method of cDNA synthesis using non-random primers | |
WO2015021990A1 (en) | Rna probing method and reagents | |
CN111979583B (en) | Construction method and application of single-stranded nucleic acid molecule high-throughput sequencing library | |
Rani et al. | Transcriptome profiling: methods and applications-A review | |
WO2020219838A1 (en) | Methods and enzymatic compositions for forming libraries of adapter ligated nucleic acid molecules | |
WO2020136438A1 (en) | Method and kit for preparing complementary dna | |
CN112680797B (en) | Sequencing library for removing high-abundance RNA and construction method thereof | |
CN108138175A (en) | For reagent, kit and the method for molecular barcode coding | |
WO2015196120A1 (en) | Methods and compositions for detecting polynucleotides and fragments thereof | |
US20140336058A1 (en) | Method and kit for characterizing rna in a composition | |
CN112585279A (en) | RNA library building method and kit | |
CN114507711B (en) | Single-cell transcriptome sequencing method and application thereof | |
CN113862263B (en) | Sequencing library construction method and application | |
WO2024222158A1 (en) | Targeted high-throughput sequencing method for detecting splicing isoform | |
US8846350B2 (en) | MicroRNA affinity assay and uses thereof | |
WO2020259303A1 (en) | Method for rapid construction of rna 3'-end gene expression library | |
CN118302538A (en) | Targeted high-throughput sequencing method for detecting splice isomers | |
JP2022547949A (en) | Methods and kits for preparing RNA samples for sequencing | |
US20210040540A1 (en) | Parallel liquid-phase hybrid capture method for simultaneously capturing sense and antisense double strands of genomic target region | |
US20240124930A1 (en) | Diagnostic and/or Sequencing Method and Kit | |
WO2023116373A1 (en) | Method for generating population of labeled nucleic acid molecules and kit for the method | |
EP4242323A1 (en) | Method for producing mirna libraries for massive parallel sequencing | |
WO2023115536A1 (en) | Method for generating labeled nucleic acid molecular population and kit thereof |