[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117437978A - Low-frequency gene mutation analysis method and device for second-generation sequencing data and application of low-frequency gene mutation analysis method and device - Google Patents

Low-frequency gene mutation analysis method and device for second-generation sequencing data and application of low-frequency gene mutation analysis method and device Download PDF

Info

Publication number
CN117437978A
CN117437978A CN202311696182.6A CN202311696182A CN117437978A CN 117437978 A CN117437978 A CN 117437978A CN 202311696182 A CN202311696182 A CN 202311696182A CN 117437978 A CN117437978 A CN 117437978A
Authority
CN
China
Prior art keywords
sequencing
sequence
base
sequences
family
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311696182.6A
Other languages
Chinese (zh)
Inventor
李宇龙
张钰
苏晓云
李彪
葛猛
叶锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Genomeprecision Technology Co ltd
Original Assignee
Beijing Genomeprecision Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Genomeprecision Technology Co ltd filed Critical Beijing Genomeprecision Technology Co ltd
Priority to CN202311696182.6A priority Critical patent/CN117437978A/en
Publication of CN117437978A publication Critical patent/CN117437978A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a low-frequency gene mutation analysis method and device for second-generation sequencing data and application thereof, and particularly relates to an analysis method and device for molecular tag sequencing data of an IonTorrent sequencing platform and application thereof. The invention designs a brand-new molecular tag sequencing data analysis method suitable for various sequencing platforms, in particular IonTorrent, and the method can be used for determining the position information of the barcode by searching and positioning the barcode sequence in the ready instead of searching and positioning a fixed sequence, comparing the ready multiple sequences in the family and correcting the base sequencing quality value during consistency analysis, eliminating false base substitution introduced in the sequencing process, and eliminating false insertion and deletion, and can be used for accurately detecting the low-frequency SNV and INDEL mutation at the same time.

Description

Low-frequency gene mutation analysis method and device for second-generation sequencing data and application of low-frequency gene mutation analysis method and device
Technical Field
The invention belongs to the technical field of bioinformatics, relates to a low-frequency gene mutation analysis method and device of second-generation sequencing data and application thereof, and particularly relates to an analysis method and device of molecular tag sequencing data for an IonTorrent sequencing platform and application thereof.
Background
In the study and application of clinical precision medicine, low frequency (< 1%) somatic mutations including point mutations, insertions and deletions of genes have been a hotspot of interest.
NGS technology is widely cited to detect genetic variations. However, NGS introduces erroneous base sequence information during the sequencing process due to the problems of the technology itself, which results in that the target mutation site information to be detected is covered by noise and cannot be detected correctly.
The Ion Torrent sequencer is a first commercial sequencer which does not need an optical system, adopts a semiconductor sequencing technology, directly converts chemical signals into digital signals through a semiconductor chip, is an economical, rapid, simple and scalable sequencing technology, and is very suitable for amplicon sequencing. Because of the characteristics of short sequencing time, cheap instrument and equipment, etc., the method is widely used. However, the sensor is not perfect for detecting the continuous bases, so that the number of the continuous bases can be error when the same base is measured, and the information of the target mutation site to be detected can be covered by noise and cannot be detected correctly.
The Illumina sequencing platform technology is mature, but the sequencing error rate of 0.1% -1% still exists, and the basic group replacement and AT basic group preference are mainly expressed.
In order to improve the detection accuracy of the low frequency mutation, a technique of molecular tags may be used to improve the detection sensitivity. In the establishment of a sequencing library, random sequences of 6bp are respectively connected to both ends of an amplified molecule, which are called barcode. Barcode will be amplified and sequenced along with the attached molecules in a downstream sequencing process. Reads with identical backode belong to the same family and can be considered amplified from the same original molecule. The reads in the same family should be perfectly identical in theory, and by combining all reads into one presentation reads through consistency analysis, base sequencing errors and redundancies of the sequencing process can be eliminated.
Bioinformatic analysis software for molecular tag sequencing data is UMItools, fgbio, samtools, smCounter and Conner, etc. UMIto, fgbio, smCounter and Conner are more applicable to data generated by the Illumina platform than to the Ion torrent platform. The Samtools' Consensu module can process sequencing data from multiple platforms, illumina and Ion torrent, and performs well in detecting SNV (single nucleotide variants), but Samtools will be referenced to the reference genome when reads are combined, and will eliminate the original insertions and deletions in reads, resulting in an inability to detect INDELs.
In conclusion, the development of the data analysis method suitable for the multiple sequencing platforms has important significance in the field of gene variation detection.
Disclosure of Invention
Aiming at the defects and actual demands of the prior art, the invention provides a low-frequency gene mutation analysis method and device for second-generation sequencing data and application thereof, in particular to an analysis method and device for molecular tag sequencing data of an IonTorrent sequencing platform and application thereof, and can accurately detect low-frequency SNV and INDEL mutations at the same time.
In order to achieve the above purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for low frequency gene mutation analysis of second generation sequencing data, the method comprising the steps of:
(1) Taking the sequence of the detected target gene as input, and establishing a database of blastn;
(2) Converting the sequenced fastq file into a fasta format file;
(3) Comparing the sequencing fasta sequence (i.e. reads) to a target gene by using blastn to obtain the position coordinates of the amplicon on the sequencing sequence, and extracting molecular tag sequence information;
(4) Classifying the sequencing sequences with the same molecular tag into the same family sequence (namely family), and filtering out family sequences with the sequencing number less than 3 in the family sequences;
(5) Performing multi-sequence comparison on all sequencing sequences in each family sequence, introducing null values (namely gaps), counting A, T, C, G and null values of all sequencing sequences in the family sequence at the same position, calculating base sequencing quality Q according to a formula (1), adding 33 to the Q, and converting the Q into characters corresponding to an ASCII table, namely a modified sequencing quality Phred33 value corresponding to each base in a fastq file;
Q = -10 log 10 (P) formula (1)
Wherein Q represents the base sequencing quality, and P represents the base sequencing error probability;
(6) And merging all sequencing sequences in the family sequences to obtain a sequencing sequence, taking the most counted base as the base of the consistent sequence (consensus sequence), judging that the sequencing insertion error exists at the position if the count of the null value of the corresponding position is the most, removing the position information in the consistent sequence, and taking the base with the highest base sequencing quality after correction if the count of a plurality of bases is the same, so as to obtain the fastq file after the consistent sequence (consensus).
According to the invention, a brand-new molecular tag sequencing data analysis method suitable for various sequencing platforms, particularly IonTorrent is designed, the barcode sequence in the reads is found and positioned through blast instead of the determination of the barcode position information through finding a fixed sequence, the multiple sequences of the reads in family are compared and the base sequencing quality value is corrected during consistency analysis, the false insertion and deletion can be eliminated besides eliminating the false base substitution introduced in the sequencing process, the method is compatible with the data of the illumina and ionorrent of the mainstream second-generation sequencing platform, and has a better effect especially for the analysis of the data of the ionorrent, and can be used for accurately detecting the low-frequency SNV and INDEL mutation at the same time.
According to the invention, the blastn is used for comparing each ready to the genome to determine the position information of the barcode on each ready, instead of only searching the fixed sequences at two ends of the ready to extract the barcode on the ready, the situation that the position information of the barcode cannot be positioned due to the fixed sequence error caused by base synthesis error or sequencing error can be avoided.
In the invention, the base sequencing quality is an important index when detecting mutation, and false positive mutation sites can be removed by screening bases with lower quality values.
In a second aspect, the present invention provides an apparatus for analyzing second-generation sequencing data, the apparatus being configured to perform the steps in the low-frequency gene mutation analysis method for second-generation sequencing data according to the first aspect, comprising:
building a database unit: the method comprises the steps of executing a database which takes a sequence of a detection target gene as input and establishing a blastn;
a conversion unit: for performing the conversion of the sequenced fastq file into a fasta format file;
a data acquisition unit: the method comprises the steps of performing comparison of a sequencing fasta sequence to a target gene by using blastn to obtain position coordinates of an amplicon on the sequencing, and extracting molecular tag sequence information;
classification unit: for performing the grouping of the sequencing sequences having the same molecular tag into the same family sequence, filtering out family sequences having a sequencing number of less than 3 in the family sequences;
calculating a corrected sequencing quality unit: the method comprises the steps of performing multi-sequence comparison on all sequencing sequences in each family sequence, introducing null values, counting A, T, C, G and null values of all sequencing sequences in the family sequence at the same position, calculating base sequencing quality Q according to a formula (1), adding 33 to the Q, and converting the Q into characters corresponding to an ASCII table, namely a corrected sequencing quality Phred33 value corresponding to each base in a fastq file;
Q = -10 log 10 (P) formula (1)
Wherein Q represents the base sequencing quality, and P represents the base sequencing error probability;
analysis unit: and the method is used for executing the combination of all sequencing sequences in the family sequence to obtain a sequencing sequence, wherein the most counted base is used as a consistent sequence base at the same position, if the number of blank values at the corresponding position is the most, the position is judged to have sequencing insertion errors, the position information is removed from the consistent sequence, and if the number of the plurality of bases is the same, the base with the highest base sequencing quality after correction is taken, so that a consistent fastq file is obtained.
In a third aspect, the present invention provides a low frequency gene mutation analysis method of the second generation sequencing data described in the first aspect or an analysis device of the second generation sequencing data described in the second aspect for use in genetic variation detection.
In a fourth aspect, the present invention provides a method of detecting low frequency genetic variation, the method comprising:
performing second-generation sequencing on a sample to be tested, analyzing the second-generation sequencing data by using the low-frequency gene mutation analysis method of the second-generation sequencing data according to the first aspect or the low-frequency gene mutation analysis method device of the second-generation sequencing data according to the second aspect, performing genome comparison by using comparison software based on analysis results, detecting variation by using variation detection analysis software, and outputting variation results.
The invention develops a molecular tag sequencing data analysis method suitable for various sequencing platforms, carries out rapid analysis processing on sequencing data, further carries out mutation detection analysis, can accurately detect low-frequency SNV and INDEL mutation simultaneously, and has wide application prospect, such as the clinical accurate medical field, the research of gene mutation basic behaviors for non-disease diagnosis and the like.
Preferably, the alignment software comprises any one of bwa software, bowtie2 software, blast software and the like;
preferably, the mutation detection analysis software includes any one of Varscan2 software, mutct 2 software, GATK software, freebayes software, and the like.
Preferably, the minimum base mass in the detection of SNV by the mutation detection analysis software is set to 20 to 25, and the value in a specific optional range may be, for example, 20, 21, 22, 23, 24 or 25, and the minimum base mass in the detection of INDEL is set to 20 to 25, and the value in a specific optional range may be, for example, 20, 21, 22, 23, 24 or 25.
In a fifth aspect, the present invention provides a computer device comprising a memory and a processor, the memory storing a computer program/instruction which when executed by the processor implements the steps of the low frequency gene mutation analysis method of the second generation sequencing data of the first aspect or the steps of the method of detecting low frequency gene mutation of the fourth aspect.
In a sixth aspect, the present invention provides a computer-readable storage medium storing a computer program for causing a computer to establish and/or run the steps of the low frequency gene mutation analysis method of the second generation sequencing data as described in the first aspect or the steps of the method of detecting low frequency gene mutation as described in the fourth aspect.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a molecular tag sequencing data detection method suitable for various sequencing platforms, in particular IonTorrent, which is used for determining the position information of a barcode in reads through blast searching instead of searching for a fixed sequence, comparing the multiple sequences of the reads in family and correcting the base sequencing quality value when carrying out consistency analysis, eliminating false base substitution introduced in the sequencing process, and eliminating false insertion and deletion, and can accurately detect SNV and INDEL mutation at the same time.
Drawings
FIG. 1 is a schematic diagram of an analysis flow;
FIG. 2 is a schematic diagram of amplicon structure;
fig. 3 is a presentation schematic.
Detailed Description
The technical means adopted by the invention and the effects thereof are further described below with reference to the examples and the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof.
The specific techniques or conditions are not identified in the examples and are described in the literature in this field or are carried out in accordance with the product specifications. The reagents or equipment used were conventional products available for purchase through regular channels, with no manufacturer noted.
Example 1
Sample preparation: first, a cancer patient sample is tested using a first generation sequencing, and the specific mutation site ratio of the patient sample is determined. And mixing the cell line of the patient with a wild type cell line according to a certain proportion to prepare reference samples with various known mutation proportions. Sample sequencing: and (5) using Ion Torrent to library and sequence the prepared sample to obtain a fastq file. The analytical procedure is as shown in fig. 1, data analysis: converting fastq file into fasta format, comparing with amplified template sequence by using blastn to obtain position coordinates of amplicon on reads, as shown in figure 2, respectively adding analysis tag and fixed sequence at two ends of amplicon when constructing sequencing library, removing base error in PCR and sequencing process by using front and rear molecular tag in subsequent analysis, recovering original sequence information, and positioning position of molecular tag in sequencing sequence by fixed sequence to extract molecular tag sequence information to obtain fastq file containing molecular tag information. Filtering family with family reads less than 3 after removing reads with length shorter than length of amplified template; performing multi-sequence comparison on all sequencing sequences in each family sequence, introducing null values (namely gaps), counting A, T, C, G and null values of all sequencing sequences in the family sequence at the same position, calculating base sequencing quality Q according to a formula (1), adding 33 to the Q, and converting the Q into characters corresponding to an ASCII table, namely a modified sequencing quality Phred33 value corresponding to each base in a fastq file; combining reads in family into one strip by using a presentation algorithm to obtain a presentation/fastq file, wherein the presentation/fastq file is shown in fig. 3, and the drawing is one family containing 6 sequencing sequences, and reducing the 6 sequences into 1 presentation reads after multi-sequence alignment, so that the sequence is considered to be an original base sequence; the bases of 6 sequences at the same position are identical, so that the original sequence is considered to be the base at the position; 2 of the 6 sequences are T at the tail end position of the continuous T, and the other 4 are deleted at the corresponding position ("-"), so that the original sequence is considered to be practically absent at the position; the 6 sequences have 5 GTGT sequences, and 1 sequence is TGTG at the corresponding position, so that the original sequence is considered to be actually GTGTGT at the position; the base at one position of 5 of the 6 sequences is A, and the base at the corresponding position of 1 is deleted, so that the original sequence is considered to be A at the position. The concresus. Fastq was aligned to the reference genome using bwa to obtain a sam file and mutation sites were detected using samtools and Varscan 2.
Samples of known mutation sites and mutation ratios were prepared as follows using IonTorrent sequencing. The target mutation types relate to snp and indel, and the target mutation proportion is 1% -30% different.
The test results are shown in Table 1.
TABLE 1
Conclusion: the detection result is basically consistent with the prediction result in the error range, which shows that the method has excellent performance on detecting point mutation and insertion mutation of IonTorrent sequencing data.
Example 2
Samples of known mutation sites and mutation ratios were prepared simultaneously, using Illumina sequencing, and the detection procedure was as described in example 1. The target mutation type is indel, and the target mutation proportion is about 10%.
The results of the detection are shown in Table 2, which shows that the method of the invention is also applicable to the Illumina sequencing data.
TABLE 2
Comparative example 1
Samples one, two, three and four of examples 1 and 2 were analyzed using samtools 'present module as a comparison technique, and the results are shown in table 3, with samtools' present module being able to detect snp, but without the ability to analyze INDEL. The invention can accurately detect INDEL.
TABLE 3 Table 3
In summary, the invention develops a molecular tag sequencing data detection method suitable for various sequencing platforms, in particular IonTorrent, and determines the position information of the barcode by blast search to locate the barcode sequence in reads instead of by searching for a fixed sequence, and compares and corrects the multiple sequences of the reads in family and the base sequencing quality value when carrying out consistency analysis, so that the invention not only eliminates the wrong base substitution introduced in the sequencing process, but also eliminates the wrong insertion and deletion, and can simultaneously and accurately detect the low-frequency SNV and INDEL mutation.
The applicant states that the detailed method of the present invention is illustrated by the above examples, but the present invention is not limited to the detailed method described above, i.e. it does not mean that the present invention must be practiced in dependence upon the detailed method described above. It should be apparent to those skilled in the art that any modification of the present invention, equivalent substitution of raw materials for the product of the present invention, addition of auxiliary components, selection of specific modes, etc., falls within the scope of the present invention and the scope of disclosure.

Claims (10)

1. A method for low frequency gene mutation analysis of second generation sequencing data, the method comprising the steps of:
(1) Taking the sequence of the detected target gene as input, and establishing a database of blastn;
(2) Converting the sequenced fastq file into a fasta format file;
(3) Comparing the sequencing fasta sequence with a target gene by using blastn to obtain the position coordinates of the amplicon on the sequencing, and extracting molecular tag sequence information;
(4) Classifying the sequencing sequences with the same molecular tag into the same family sequence, and filtering out family sequences with the sequencing number less than 3 in the family sequences;
(5) Comparing all sequencing sequences in each family sequence in multiple sequences, introducing null values, counting A, T, C, G and null values of all sequencing sequences in the family sequence at the same position, calculating base sequencing quality Q according to a formula (1), adding 33 to the Q, and converting the Q into characters corresponding to an ASCII table, namely a modified sequencing quality Phred33 value corresponding to each base in a fastq file;
Q = -10 log 10 (P) formula (1)
Wherein Q represents the base sequencing quality, and P represents the base sequencing error probability;
(6) And merging all sequencing sequences in the family sequence to obtain a sequence, wherein the most counted base at the same position is used as a consistent sequence base, if the number of blank values at the corresponding position is the most, determining that sequencing insertion errors exist at the position, removing position information in the consistent sequence, and if the number of the plurality of bases is the same, taking the base with the highest base sequencing quality after correction to obtain a consistent fastq file.
2. An apparatus for analyzing second-generation sequencing data, wherein the apparatus is configured to perform the steps of the method for analyzing low-frequency gene mutation of second-generation sequencing data according to claim 1, comprising:
building a database unit: the method comprises the steps of executing a database which takes a sequence of a detection target gene as input and establishing a blastn;
a conversion unit: for performing the conversion of the sequenced fastq file into a fasta format file;
a data acquisition unit: the method comprises the steps of performing comparison of a sequencing sequence to a target gene by using blastn, obtaining position coordinates of an amplicon on the sequencing sequence, and extracting molecular tag sequence information;
classification unit: for performing the grouping of sequenced fasta sequences having the same molecular tag into the same family sequence, filtering out family sequences having a sequence number of less than 3 in the family sequences;
calculating a corrected sequencing quality unit: the method comprises the steps of performing multi-sequence comparison on all sequencing sequences in each family sequence, introducing null values, counting A, T, C, G and null values of all sequencing sequences in the family sequence at the same position, calculating base sequencing quality Q according to a formula (1), adding 33 to the Q, and converting the Q into characters corresponding to an ASCII table, namely a corrected sequencing quality Phred33 value corresponding to each base in a fastq file;
Q = -10 log 10 (P) formula (1)
Wherein Q represents the base sequencing quality, and P represents the base sequencing error probability;
analysis unit: and the method is used for executing the combination of all sequencing sequences in the family sequences to obtain a sequencing sequence, wherein the most counted base at the same position is used as the base of the consistent sequence, if the count of the null value at the corresponding position is the most, the position is judged to have sequencing insertion errors, the position information is removed from the consistent sequence, and if the count of a plurality of bases is the same, the base with the highest base sequencing quality after correction is taken, so that the fastq file after the coincidence is obtained.
3. Use of the low frequency gene mutation analysis method of the second generation sequencing data of claim 1 or the analysis device of the second generation sequencing data of claim 2 in gene mutation detection.
4. A method of detecting low frequency genetic variation, the method comprising:
performing second-generation sequencing on a sample to be tested, analyzing the second-generation sequencing data by using the low-frequency gene mutation analysis method of the second-generation sequencing data or the analysis device of the second-generation sequencing data of claim 1, performing genome comparison by using comparison software based on analysis results, detecting variation by using variation detection analysis software, and outputting variation results.
5. The method of claim 4, wherein the alignment software comprises any of bwa software, bowtie2 software or blast software.
6. The method of claim 4, wherein the mutation detection analysis software comprises any of Varscan2 software, mutct 2 software, GATK software, or Freebayes software.
7. The method of claim 6, wherein the minimum base mass of the mutation detection analysis software is 20-25 when detecting SNV.
8. The method of claim 6, wherein the minimum alkali matrix value is 20-25 when the variation detection analysis software detects INDEL.
9. A computer device comprising a memory and a processor, the memory storing a computer program/instruction, wherein the computer program/instruction when executed by the processor performs the steps of the low frequency gene mutation analysis method of the second generation sequencing data of claim 1 or the steps of the method of detecting low frequency gene variation of any one of claims 4-8.
10. A computer-readable storage medium storing a computer program, wherein the computer program causes a computer to establish and/or run the steps of the low frequency gene mutation analysis method of the second generation sequencing data of claim 1 or the steps of the method of detecting low frequency gene mutation of any one of claims 4 to 8.
CN202311696182.6A 2023-12-12 2023-12-12 Low-frequency gene mutation analysis method and device for second-generation sequencing data and application of low-frequency gene mutation analysis method and device Pending CN117437978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311696182.6A CN117437978A (en) 2023-12-12 2023-12-12 Low-frequency gene mutation analysis method and device for second-generation sequencing data and application of low-frequency gene mutation analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311696182.6A CN117437978A (en) 2023-12-12 2023-12-12 Low-frequency gene mutation analysis method and device for second-generation sequencing data and application of low-frequency gene mutation analysis method and device

Publications (1)

Publication Number Publication Date
CN117437978A true CN117437978A (en) 2024-01-23

Family

ID=89553645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311696182.6A Pending CN117437978A (en) 2023-12-12 2023-12-12 Low-frequency gene mutation analysis method and device for second-generation sequencing data and application of low-frequency gene mutation analysis method and device

Country Status (1)

Country Link
CN (1) CN117437978A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101921840A (en) * 2010-06-30 2010-12-22 深圳华大基因科技有限公司 DNA molecular label technology and DNA incomplete interrupt policy-based PCR sequencing method
CN106701897A (en) * 2015-11-12 2017-05-24 深圳华大基因研究院 Method and apparatus for simultaneously detecting gene point mutation, insertion/deletion and CNV
CN108154010A (en) * 2017-12-26 2018-06-12 东莞博奥木华基因科技有限公司 A kind of ctDNA low frequencies mutation sequencing data analysis method and device
CN110734908A (en) * 2019-11-15 2020-01-31 福州福瑞医学检验实验室有限公司 Construction method of high-throughput sequencing library and kit for library construction
US20210225456A1 (en) * 2018-07-27 2021-07-22 Myriad Women's Health, Inc. Method for detecting genetic variation in highly homologous sequences by independent alignment and pairing of sequence reads
KR20210112350A (en) * 2019-01-04 2021-09-14 윌리엄 마쉬 라이스 유니버시티 Quantitative amplicon sequencing for detection of multiple copy number variations and quantification of allele ratios
CN114530199A (en) * 2022-01-19 2022-05-24 重庆邮电大学 Method and device for detecting low-frequency mutation based on double sequencing data and storage medium
US20220364080A1 (en) * 2019-09-20 2022-11-17 Sophia Genetics S.A. Methods for dna library generation to facilitate the detection and reporting of low frequency variants
CN115369159A (en) * 2022-08-30 2022-11-22 上海交通大学医学院 Ultralow frequency mutation detection method based on double-end sequencing overlapping fragment and DNA double-strand complementary fragment
CN116469462A (en) * 2023-03-20 2023-07-21 重庆邮电大学 Ultra-low frequency DNA mutation identification method and device based on double sequencing

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101921840A (en) * 2010-06-30 2010-12-22 深圳华大基因科技有限公司 DNA molecular label technology and DNA incomplete interrupt policy-based PCR sequencing method
CN106701897A (en) * 2015-11-12 2017-05-24 深圳华大基因研究院 Method and apparatus for simultaneously detecting gene point mutation, insertion/deletion and CNV
CN108154010A (en) * 2017-12-26 2018-06-12 东莞博奥木华基因科技有限公司 A kind of ctDNA low frequencies mutation sequencing data analysis method and device
US20210225456A1 (en) * 2018-07-27 2021-07-22 Myriad Women's Health, Inc. Method for detecting genetic variation in highly homologous sequences by independent alignment and pairing of sequence reads
KR20210112350A (en) * 2019-01-04 2021-09-14 윌리엄 마쉬 라이스 유니버시티 Quantitative amplicon sequencing for detection of multiple copy number variations and quantification of allele ratios
US20220364080A1 (en) * 2019-09-20 2022-11-17 Sophia Genetics S.A. Methods for dna library generation to facilitate the detection and reporting of low frequency variants
CN110734908A (en) * 2019-11-15 2020-01-31 福州福瑞医学检验实验室有限公司 Construction method of high-throughput sequencing library and kit for library construction
CN114530199A (en) * 2022-01-19 2022-05-24 重庆邮电大学 Method and device for detecting low-frequency mutation based on double sequencing data and storage medium
CN115369159A (en) * 2022-08-30 2022-11-22 上海交通大学医学院 Ultralow frequency mutation detection method based on double-end sequencing overlapping fragment and DNA double-strand complementary fragment
CN116469462A (en) * 2023-03-20 2023-07-21 重庆邮电大学 Ultra-low frequency DNA mutation identification method and device based on double sequencing

Similar Documents

Publication Publication Date Title
US10991453B2 (en) Alignment of nucleic acid sequences containing homopolymers based on signal values measured for nucleotide incorporations
EP2834762B1 (en) Sequence assembly
CN107229841B (en) A kind of genetic mutation appraisal procedure and system
Larsson et al. Comparative microarray analysis
CN113249453B (en) Method for detecting copy number change
Kearse et al. The Geneious 6.0. 3 read mapper
CN115691672B (en) Base quality value correction method and device for sequencing platform characteristics, electronic equipment and storage medium
CN115083521B (en) Method and system for identifying tumor cell group in single cell transcriptome sequencing data
KR101795662B1 (en) Apparatus and Method for Diagnosis of metabolic disease
CN106591451B (en) Method for determining the content of fetal free DNA and device for carrying out said method
CN109461473B (en) Method and device for acquiring concentration of free DNA of fetus
US20240221954A1 (en) Disease prediction methods and devices, electronic devices, and computer readable storage media
CN117437978A (en) Low-frequency gene mutation analysis method and device for second-generation sequencing data and application of low-frequency gene mutation analysis method and device
CN116072222B (en) Method for identifying and splicing viral genome and application thereof
CN116994649A (en) Intelligent judging method and intelligent judging system for gene detection data
WO2019213810A1 (en) Method, apparatus, and system for detecting chromosome aneuploidy
CN112885407B (en) Second-generation sequencing-based micro-haplotype detection and typing system and method
CN113409886A (en) HIV subtype classification system and classification method
Veeramachaneni Data Analysis in Rare Disease Diagnostics
JP2004219140A (en) Mass spectrum analyzing method and computer program
Liu et al. Systematic biases in reference-based plasma cell-free DNA fragmentomic profiling
CN113327646A (en) Sequencing sequence processing method and device, storage medium and electronic equipment
KR101907650B1 (en) Method of non-invasive trisomy detection of fetal aneuploidy
CN114171118B (en) Data processing method and device for noninvasive gene detection
CN117577182B (en) System for rapidly identifying drug identification sites and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20240123