WO2014036167A1 - Détection de variants dans des données de séquençage et un étalonnage - Google Patents
Détection de variants dans des données de séquençage et un étalonnage Download PDFInfo
- Publication number
- WO2014036167A1 WO2014036167A1 PCT/US2013/057128 US2013057128W WO2014036167A1 WO 2014036167 A1 WO2014036167 A1 WO 2014036167A1 US 2013057128 W US2013057128 W US 2013057128W WO 2014036167 A1 WO2014036167 A1 WO 2014036167A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequencing
- data
- variants
- tumor
- filters
- Prior art date
Links
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 110
- 238000000034 method Methods 0.000 claims abstract description 96
- 230000000392 somatic effect Effects 0.000 claims abstract description 36
- 230000004075 alteration Effects 0.000 claims abstract description 11
- 230000035772 mutation Effects 0.000 claims description 103
- 206010028980 Neoplasm Diseases 0.000 claims description 85
- 230000035945 sensitivity Effects 0.000 claims description 78
- 108700028369 Alleles Proteins 0.000 claims description 74
- 210000004602 germ cell Anatomy 0.000 claims description 50
- 238000001514 detection method Methods 0.000 claims description 38
- 238000005070 sampling Methods 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 18
- 238000009826 distribution Methods 0.000 claims description 9
- 238000012217 deletion Methods 0.000 claims description 7
- 230000037430 deletion Effects 0.000 claims description 7
- 238000003780 insertion Methods 0.000 claims description 7
- 230000037431 insertion Effects 0.000 claims description 7
- 238000001712 DNA sequencing Methods 0.000 claims description 5
- 230000002596 correlated effect Effects 0.000 claims description 4
- 238000003559 RNA-seq method Methods 0.000 claims description 2
- 238000004590 computer program Methods 0.000 abstract description 6
- 238000013459 approach Methods 0.000 description 30
- 206010069754 Acquired gene mutation Diseases 0.000 description 22
- 230000037439 somatic mutation Effects 0.000 description 22
- 201000011510 cancer Diseases 0.000 description 17
- 230000006870 function Effects 0.000 description 13
- 108020004414 DNA Proteins 0.000 description 11
- 238000011109 contamination Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 239000000523 sample Substances 0.000 description 9
- 238000010200 validation analysis Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 8
- 210000004027 cell Anatomy 0.000 description 8
- 238000005192 partition Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 7
- 101000708766 Homo sapiens Structural maintenance of chromosomes protein 3 Proteins 0.000 description 6
- 238000007781 pre-processing Methods 0.000 description 6
- 239000007787 solid Substances 0.000 description 6
- 230000000717 retained effect Effects 0.000 description 5
- 238000006467 substitution reaction Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000036438 mutation frequency Effects 0.000 description 4
- 108090000623 proteins and genes Proteins 0.000 description 4
- 230000000306 recurrent effect Effects 0.000 description 4
- 230000009897 systematic effect Effects 0.000 description 4
- 108091026890 Coding region Proteins 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000000528 statistical test Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 2
- 239000010432 diamond Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000000126 in silico method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000000869 mutational effect Effects 0.000 description 2
- 238000007481 next generation sequencing Methods 0.000 description 2
- 239000002773 nucleotide Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 210000002307 prostate Anatomy 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 238000012070 whole genome sequencing analysis Methods 0.000 description 2
- 208000010839 B-cell chronic lymphocytic leukemia Diseases 0.000 description 1
- 206010009944 Colon cancer Diseases 0.000 description 1
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 1
- 206010061818 Disease progression Diseases 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 208000000172 Medulloblastoma Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 208000034578 Multiple myelomas Diseases 0.000 description 1
- 108700026244 Open Reading Frames Proteins 0.000 description 1
- 206010035226 Plasma cell myeloma Diseases 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 208000032852 chronic lymphocytic leukemia Diseases 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 239000013068 control sample Substances 0.000 description 1
- 238000012350 deep sequencing Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 206010012818 diffuse large B-cell lymphoma Diseases 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 230000005750 disease progression Effects 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000004077 genetic alteration Effects 0.000 description 1
- 231100000118 genetic alteration Toxicity 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000003205 genotyping method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 201000001441 melanoma Diseases 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 238000011275 oncology therapy Methods 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000011896 sensitive detection Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 210000002536 stromal cell Anatomy 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 210000004881 tumor cell Anatomy 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000007482 whole exome sequencing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
Definitions
- HHSN261201000055C awarded by the Department of Health and Human Services
- TECHNICAL FIELD [0003] This disclosure relates generally to sequencing data processing and benchmarking, and in particular, to detecting variants in sequencing data.
- Cancer is a disease of the genome wherein somatic genetic alterations transform normal cells into malignant cells. Detecting, cataloguing and interpreting these somatic events are at the core of a rapidly increasing number of cancer genome projects such as The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC), which involve thousands of cases harboring millions of mutations. As sequencing moves from research into clinical use, for example, as a tool for diagnostic, even more cases will need to be characterized.
- TCGA Cancer Genome Atlas
- ICGC International Cancer Genome Consortium
- Somatic single-nucleotide substitutions are an important and common mechanism for altering gene function in cancer. Yet, they are challenging to identify. First, they occur at a very low frequency in the genome, ranging from 0.1 to 100 mutations per megabase, depending on tumor type. Second, the alterations may be present only in a small fraction of the DNA molecules originating from the specific genomic locus. The reasons include contamination of cancer cells with surrounding stromal cells; local copy-number variation within the cancer genome; and presence of a mutation within only a sub-population of the tumor cells ('subclonality'). The fraction of DNA harboring an alteration ('allelic fraction') has been reported to be as low as 0.05 for highly impure tumors. Consequently, a mutation calling method must be highly sensitive to somatic mutations with very low allelic fractions (i.e. fraction of sequencing reads that support the mutation).
- the sensitivity and specificity of any somatic mutation caller varies along the genome. They depend on factors including, for example, depth of sequence coverage in the tumor and normal; the local sequencing error rate; the allelic fraction of the mutation; and the evidence thresholds used to declare a mutation. Understanding how sensitivity and specificity depend on these factors is necessary for designing experiments with adequate power to detect mutations at a given allelic fraction, as well as for inferring the mutation frequency along the genome, which is a key parameter for understanding mutational processes and significance analysis.
- the current subject matter relates to a computer-implemented method.
- the method can include receiving aligned sequencing data; applying one or more filters to the aligned sequencing data; using the filtered data as input, applying a first classifier to determine if any alteration is present beyond an expected threshold due to a sequencing error and identifying one or more candidate variants; passing the one or more identified candidate variants through one or more additional filters to remove one or more false positives; and determining a somatic status of the one or more filtered candidate variants using a second classifier.
- At least one of the above can be performed on at least one processor.
- the sequencing data may include DNA sequencing or RNA sequencing data.
- the one or more variants are mutations, point mutations, somatic point mutations, or germline point mutations.
- the one or more false positives are created by correlated sequencing noise.
- a Panel of Normals is used to identify one or more false positives.
- At least one of the first and second classifiers can be a Bayesian classifier.
- the one or more filters include a proximal gap filter which rejects variants with neighboring insertion and/or deletion events. In some implementations, the one or more filters include a poor mapping region filter which rejects sites having a determined mapping quality score of zero. In some
- the one or more filters include a clustered position filter which looks for correlation in the position of mutant alleles within their reads.
- the one or more filters include a strand bias filter which rejects sites where a distribution of strand observations of mutant allele is biased compared to the allele of the reference genome.
- the one or more filters include a tri allelic site filter which excludes sites each having at least three alleles beyond what is expected by sequencing error.
- the one or more filters include an observed in control filter which uses sequencing data from a matched normal as control data to eliminate sites where the reference genome has evidence of mutant allele.
- a system for detecting one or more variants from sequencing data can include means for receiving aligned sequencing data; means for applying one or more filters to the aligned sequencing data; means for using the filtered data as input, applying a first classifier to determine if any alteration is present beyond an expected threshold due to a sequencing error and identifying one or more candidate variants; means for passing the one or more identified candidate variants through one or more additional filters to remove one or more false positives; and means for determining a somatic status of the one or more filtered candidate variants using a second classifier.
- a method for benchmarking performance of variant detection includes providing variants that were discovered in deep- coverage data sets; down-sampling by randomly excluding a subset of reads of the data set at sites of known validated variants; repeating the down-sampling one or more times and estimating a sensitivity as a fraction of the times the known variants are detected. At least one of the above is performed by at least one data processor.
- a method for benchmarking performance of variant detection includes creating a normal virtual tumor that has no true variants; providing sequence data from a single normal sample; assigning reads of the sequence data to be either "tumor” or "normal” to a desired depth; and measuring specificity by comparing the normal virtual tumor against the sequence data. At least one of the above is performed by at least one data processor.
- Articles of manufacture are also described that comprise computer executable instructions permanently stored on non-transitory computer readable media, which, when executed by a computer, causes the computer to perform operations herein.
- computer systems are also described that may include a processor and a memory coupled to the processor. The memory may temporarily or permanently store one or more programs that cause the processor to perform one or more of the operations described herein.
- operations specified by methods described herein can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems.
- the present subject matter provides a novel somatic point mutation caller, which we call "MuTect,” is believed to be superior to prior methods in terms of sensitivity, particularly for low allelic fraction events, while remaining highly specific. This uniquely positions the method to deeply explore the mutational landscape of highly impure tumor samples, as well as the subclones with a tumor. The ability to characterize these subclonal events is not only critical to understanding tumor evolution both in disease progression and response to treatment, but also as a clinical diagnostic for personalized cancer therapy.
- a differentiator of the current subject matter allowing it to be sensitive to low allele fraction mutations is the explicit modeling of alternate alleles at any frequency, whereas alternative methods typically assume heterozygous genotypes as the basis for their calculations.
- Down-sampling This approach involves studying somatic mutations that were discovered in deep-coverage cancer data sets and then experimentally validated, to see if these "gold-standard" mutations would have been found with lower coverage. Down-sampling can be accomplished, for example, by randomly excluding a subset of the reads at the sites of these validated mutations. For depths of coverage from 5x to 50x in the tumor and normal, the down-sampling procedure can be performed repeatedly and the sensitivity can be estimated as the fraction of times the known mutation is detected. Notably, down-sampling preserves the expected allelic fraction of the original mutation, because reads are removed regardless whether or not they contain the alternate allele.
- a virtual tumor can be created that has no true mutations. Using sequence data from a single normal sample, the reads can be assigned to be either 'tumor' or 'normal' to a desired depth. By applying methods to this virtual tumor-normal pair, the specificity of the method can be easily measured because any somatic mutations identified are necessarily false positives.
- a virtual tumor can be created that has true mutations only at known sites.
- mutations can be introduced by substituting reads from a second normal sample ("B").
- B normal sample
- sites at which B contains heterozygous germline variants not found in A can be identified.
- Reads in the virtual tumor with variant-containing reads from B can be replaced, following a binomial distribution given a specified allelic fraction.
- One advantage of using germline events is that they are frequent ( ⁇ 1000/Mb) and accurately detected, as they have often been genotyped by multiple technologies. In this manner, real sequencing data can be used to introduce somatic mutations within a virtual tumor to any desired depth and allelic fraction.
- the two benchmarking approaches can be complementary: down- sampling uses real somatic mutations but is limited to previously detected and validated mutations, whereas the virtual tumor approach can generate a large datasets but reflects the distribution of events that occur in the germline.
- Figure 1 is a process flow diagram illustrating an exemplary implementation of the present subject matter
- Figures 2a and 2b show sensitivity and specificity of results in accordance with some implementations of the present subject matter
- Figures 3a-3f show various results of specificity of somatic classification and variant detection using an exemplary implementation of the present subject matter
- Figures 4a-4d show comparisons of various benchmarks of implementations of the present subject matter against different detection methods.
- Figure 5 is a process flow diagram illustrating an exemplary implementation of the present subject matter.
- the present subject matter is directed to the detection of variants, which include, for example, alterations, allelic variants, mutations and polymorphisms.
- the sequencing data may include, for example, DNA, RNA, cDNA, and/or other genetic sequencing data.
- down-sampling can use subsets of reads from primary sequencing data of validated somatic mutations to measure the sensitivity with which a mutation caller identifies the known mutations.
- Subsets can be generated by randomly excluding reads from the experimentally-derived data set until a desired depth of coverage is reached.
- down-sampling can preserve the expected allelic fraction of the original mutation because reads are removed regardless whether or not they contain the mutant allele.
- the down-sampling approach can potentially be limited in four respects: (i) the number of validated events is typically small, resulting in larger error bars for the sensitivity estimate; (ii) because allele fractions are preserved, only previously validated allele fractions can be explored; (iii) the analysis excludes any mutations that were not originally detected and hence may overestimate the true sensitivity; and (iv) specificity cannot be measured.
- virtual tumors and normal can be created, at controlled depths, from sequencing data generated by two different sequencing experiments of the same normal sample (designated A). All mutations identified are necessarily false positives.
- somatic mutations can be simulated at controlled allele fractions by replacing selected reads in the virtual tumor with reads from a second sample (designated B) at loci where sample A is reference and sample B harbors a high confidence germline heterozygous event. The ability of an algorithm to detect these simulated somatic mutations can then be assessed. In this manner, sensitivity can be measured using real sequencing data at a desired depth of coverage and allelic fraction.
- the two benchmarking approaches can be complementary. Down-sampling can use real somatic mutations, but can be limited in the parameter regimes it can explore, and it cannot measure specificity directly.
- the virtual tumor approach does not have these limitations. However, it simulates somatic mutations using germline events, which differ from somatic mutations in their nucleotide substitution frequencies and context. As recalibrated base qualities vary for the different bases (owing to biases in machine errors), there is variable sensitivity to detect different substitutions (Fig. 2). Because the difference in sensitivity is minimal, all the germline events can be chosen. However, it is possible with the virtual tumor approach to simulate the mutation spectrum of a specific tumor type by reweighting the germline events to match the expected mutation spectrum of the tumor.
- the present subject matter takes as input sequence data from matched tumor and normal DNA, following alignment of the data to a reference genome and standard preprocessing steps. Examples of the preprocessing steps can be found in DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics 43, 491-498 (201 1), the contents of which are incorporated herein by reference. In some implementations, the present subject matter operates on each genomic locus
- the process flow 500 includes receiving DNA (e.g.) sequencing data at 502, aligning DNA sequencing data to a reference genome at 504, applying one or more filters to the aligned DNA sequencing data at 506, applying a first Bayesian classifier on the filtered data and identifying one or more candidate mutations at 508, applying one or more additional filters to the candidate mutation(s) at 510, and applying a second Bayesian classifier on the filtered candidate mutation(s) and determining a variant (somatic) status or classification of the filtered candidate mutation(s) at 512.
- the present subject matter can take as input paired tumor and normal next generation sequencing- data and, after removing low quality reads, determines if there is evidence for a variant beyond the expected random sequencing errors (variant detection will be discussed in more detail below).
- Candidate variant sites are then passed through, for example, one or more (including all) filters to remove sequencing and alignment artifacts:
- Triallelic Site filter excludes sites where there appear to be at least three alleles at the site beyond what is expected by sequencing error suggesting a site not accurately modeled by the base quality scores;
- a Panel of Normals can be used to screen out remaining false positives caused by rare error modes only detectable using more samples. Finally, the somatic or germline status of passing variants is determined using the matched normal.
- the present subject matter can take as input sequence data from matched tumor and normal DNA after alignment of the reads to a reference genome and preprocessing steps discussed above, which include, for example, marking of duplicate reads, recalibration of base quality scores and local realignment.
- the method operates on each genomic locus independently and consists of four key steps (Fig. 1 ): (i) Removal of low-quality sequence data (based on known methods); (ii) variant detection in the tumor using a Bayesian classifier; (iii) filtering to remove false positives resulting from correlated sequencing artifacts that are not captured by the error model; and (iv) designation of the variants as somatic or germline by a second Bayesian classifier.
- variants in the tumor can be identified by analyzing the data at each site under, for example, two alternative models:
- a variant model M 1 which assumes the site contains a true variant allele m at allele fraction / and also allows, as in Mo, for the possibility of sequencing errors.
- the allele fraction / is unknown and is estimated as the fraction of tumor reads that support m.
- m can be declared to be a candidate variant if the log-likelihood ratio of the data under the variant and reference models - that is, the LOD score (log odds) - exceeds a predefined decision threshold that depends on the expected mutation frequency and the desired false positive rate (Online Methods).
- ROC Receiver Operating Characteristic
- the LOD score is useful as a threshold for declaring the presence of mutations, as can be observed in the concordance of predicted sensitivity and measured sensitivity from the virtual tumor approach ( Figure 2a, solid grey line vs. dashed line). Nonetheless, the LOD score cannot be immediately translated into the probability that a variant is due to true mutation rather than to sequencing error because the LOD score is calculated under an assumption of independent sequencing errors and accurate read placement. As will be discussed below, these assumptions are incorrect and as a result, although direction application of the LOD score accurately estimates the sensitivity to detect a mutation, it can substantially underestimate the true false positive rate.
- Figures 2a and 2b show the sensitivity as a function of sequencing depth and allelic fraction.
- Results using a model of independent sequencing errors with uniform Q35 base quality scores and accurate read placement (solid grey) are shown as well as results from the virtual tumor approach for the standard (STD, dashed green) and high-confidence (HC, solid green) configuration.
- a typical setting of ⁇ 6.3 is marked with black dots.
- the calculated sensitivity using a model of independent sequencing errors and accurate read placement with uniform Q35 ase quality scores (solid lines) are shown as well as results from the virtual tumor approach (circles) and the downsampling of validated colorectal mutations (diamonds). Error bars represent 95% CIs.
- the sensitivity of the method is similar as estimated by the calculation and the virtual tumor benchmark both with (HC) and without (STD) filters. This demonstrates that the model is accurate with respect to detection and that the filters do not adversely impact sensitivity.
- each variant detected in the tumor is designated as somatic (not present in the matched normal), germline (present in the matched normal) or variant (present in the tumor, but indeterminate status in the matched normal due to insufficient data).
- a LOD score can be used that compares the likelihood of the data under models in which the variant is present (at 50% frequency) or absent in the matched normal (Online Methods).
- the power to make a germline classification given the data and threshold can be calculated.
- insufficient data for classification is declared if there is less than 95% power.
- public germline variation databases can be used as a prior probability of an event being germline. Sensitivity
- these benchmarking methods can be further applied to further evaluate the sensitivity of our mutation detection method, with the different filtering options (STD, HC and HC+PON), to detect mutations as a function of sequencing depth and allelic fraction (Figure 2b).
- the sensitivity can be calculated under a model of independent sequencing errors and accurate read placement using, for example, a statistical test given an allelic fraction; tumor sequencing depth; and assuming all bases have a fixed base quality score of Q35 (approximate mean base quality score in simulation data; Online Methods).
- HC+PON may not be used in the virtual tumor sensitivity benchmark because it discards common germline sites.
- the present subject matter is a highly sensitive detection method. It can detect mutations, for example, at a site with 3 Ox depth in the tumor (typical of whole genome sequencing) and an allele fraction of 0.2 with 95.6% sensitivity.
- the sensitivity can be increased to 99.9% by sequencing deeper (e.g., to a depth of 50x), and drops to, e.g., 58.9% for detecting mutations with allelic fraction of 0.1 (at 30x) ( Figure 2b).
- the present subject matter can have, e.g., 66% sensitivity for 3% allele fraction events. It is this sensitivity to detect low-allele fraction events that uniquely positions the present subject matter to analyze samples with low purity or with complex subclonal structure.
- the virtual tumor approach can be used, for example, across 1 Gb of NA12878 at various depths in the virtual tumor and at 3 Ox in the virtual normal. All detected events are false positives, but to eliminate those due to under-calling germline events from consideration, we excluded all known germline variant sites.
- STD no filters
- the false positive rate increased with depth (from 6.7/mb at 5x to 20.1/mb at 30x) (Fig. 3a). This is due to the increased power to call mutations with lower allele fractions, which are enriched with false positives (Fig. 3b).
- the HC filters reduce the false positive rate by an order of magnitude (1.00/mb at 30x).
- the Panel of Normals filters out remaining rare, but recurrent, artifacts (0.51/mb at 30x).
- Certain filters such as the Poor Mapping filter, have the biggest effect at low depths whereas other filters are more invariant to depth, such as the Proximal Gap filter (Fig. 3c).
- the Clustered Position filter rejects the most sites exclusively. However, the majority of false positives are rejected by several filters.
- the filters specifically address these additional errors and reduce the false positive rate by an order of magnitude (from 21.3/mb to 0.90/mb at 30x tumor depth).
- the Panel of Normals (HC+PON) then filters out remaining rare, but recurrent, artifacts. Certain filters, such as the Poor Mapping filter, have the biggest effect at low depths whereas other filters are more invariant to depth, such as the Proximal Gap filter ( Figure 3 c).
- the Clustered Position filter rejects the most sites exclusively, although multiple filters reject the majority of false positives.
- false positives can be further reduced by taking each read in the tumor and normal, and realigning them to a reference genome with stringent alignment settings.
- the resulting alignments can be re-processed by the present subject matter to see if enough evidence for the mutation exists after considering the more stringent alignments.
- Figures 4a-4d show the benchmarking mutation detection methods. Specifically, the sensitivity of the methods was evaluated with regard to allele fraction and tumor sequencing depth using the virtual tumor (Fig. 4a) and down-sampling approaches, and a sharp distinction in sensitivity was observed, particularly at lower allele fractions. Data were analyzed for 3 Ox sequence coverage. In the standard configurations, all methods show > 99.3% sensitivity for mutations at an allele fraction of 0.4. However, in the HC configurations, the present subject matter, JointSNVMix and Strelka remain sensitive, 98.8%), 96.6% and 98.5% respectively, whereas SomaticSniper drops to 91.5%.
- the present subject matter HC can detect more than half of the mutations (53.2%), whereas Strelka HC detects only 29.7%), JointSNVMix HC drops to 16.8% and SomaticSniper HC falls to 7.4%.
- the present subject matter HC has 16.0% sensitivity but can be increased to 51.9%) with 60x coverage.
- SomaticSniper HC have a sensitivity of ⁇ 2.0%, and the sensitivity does not increase appreciably with tumor sequencing depth. Strelka HC detects just 4.6%> of the events at 30x and only increases to 20.8% at 60x. Sensitivity for such low allelic fraction events is critical for characterizing impure tumors or subclonal mutations in heterogeneous tumors, and it appears that the present subject matter is much more sensitive in this regime.
- cancer genome community will greatly benefit from a systematic performance measurement using the approaches described here across the entire parameter space of tumor and normal depths and mutation allele fraction.
- the approaches described herein can also be extended in the future to other alterations such as indels or rearrangements.
- the cancer genome community is eager to adopt new and improved methods but require detailed
- the present subject matter is shown to be much more sensitive at a given specificity than competing methods, allowing one to more comprehensively characterize the landscape of somatic mutations, particularly those present in a small fraction of cancer cells. Moreover, this can be done with standard sequencing depths enabling analysis of the large datasets that are being generated worldwide. Analysis of subclonal mutations and changes in the fractions of cancer cells which harbor them is a powerful way to study the evolution of subclones as they progress during treatment, metastasis and relapse. In particular, we demonstrated that the presence of subclonal mutations in genes involved in driving chronic lymphocytic leukemia (CLL) is an independent prognostic factor beyond the currently used clinical parameters.
- CLL chronic lymphocytic leukemia
- Figure 1 is an overview of somatic point mutation detection using the present subject matter.
- the present subject matter takes as input tumor (T) and normal (N) next generation sequencing data and, after removing low quality reads, determines if there is evidence for a variant beyond the expected random sequencing errors.
- Candidate variant sites are then passed through six filters to remove artifacts (Table 1).
- a Panel of Normals can be used to screen out remaining false positives caused by rare error modes only detectable in additional samples.
- somatic or germline status of passing variants is determined using the matched normal.
- Figure 2 shows sensitivity as a function of sequencing depth and allelic fraction.
- Figure 3 shows specificity of variant detection and variant classification using virtual tumor approach, (a) Somatic miscall error rate for true reference sites as a function of tumor sequencing depth for the STD (red), HC (blue) and HC+PON (green) configurations of the present subject matter. Error bars represent 95% CIs. (b) Distribution of allele fraction for all miscalls as a function of tumor sequencing depth, (c) Fraction of events rejected by each filter; hashed regions indicate events rejected exclusively by each filter, (d) Somatic miscall error rate for true germline SNP sites by sequencing depth in the normal when the site is known to be variant in the population (blue) and novel (red). Error bars represent 95%> CIs. (e,f) Mean power as a function of sequencing depth in the normal to have classified these events as germline or somatic at novel germline sites (e) and known germline variant sites (f).
- Figure 4 shows benchmarking mutation detection methods
- Reads are preprocessed differently according to how they will be used: detection of the variant in the tumor, discovery of an artifact in the normal or for somatic classification.
- This filter attempts to remove false positives caused by nearby misaligned small insertion and deletion events.
- the site can be rejected if there are > 3 reads with insertions within an 1 lbp window centered on the candidate mutation OR if there are > 3 reads with deletions within the same 1 lbp window.
- This filter attempts to remove false positives caused by sequence similarity in the genome by looking at the fraction of reads which have a mapping quality score of zero. Candidates are rejected of > 50% of the reads in the tumor and normal have a mapping quality of zero.
- This filter attempts to reject false positives caused by calling triallelic sites where the normal is heterozygous with alleles A/B and the present subject matter is considering an alteration of allele C. Although this is biologically possible, and remains an area for future improvement in mutation detection, calling at these sites generates many false positives and therefore they are currently filtered out by default.
- This filter attempts to reject false positives caused by context specific sequencing errors where the vast majority of the alternate alleles are observed in a single direction of reads. In some implementations, this test is performed by stratifying the reads by direction and then performing the core detection statistic on the data.
- the method calculates the median and median absolute deviation of the distance from both the start and end of the read and reject sites that have a median ⁇ 10 (near the start/end of the alignment) and a median absolute deviation ⁇ 3 (clustered).
- a panel of normal samples can be employed as a screen.
- the present subject matter can run on them as if they were tumors without matched normal and all artifact- processing disabled ( ⁇ artifact_detection_mode). From this data, a VCF file is created for the sites that were identified by the present subject matter in two or more samples.
- This VCF can then supplied to the caller, which rejects these sites. However, if the site was present in the supplied VCF of known mutations (--cosmic) it is retained because these sites could represent known recurrent somatic mutations which have been detected in the panel of normal when the normal are from adjacent tissue or have some contamination tumor DNA.
- Variant (Somatic) Classification To perform this classification, we use a similar classifier to the one described above. In this case /, in M 1 , is conservatively set to 0.5 for a germline heterozygous variant.
- a threshold of 10 can be set, which is higher than the threshold for ⁇ ⁇ so as to obtain more confidence in the somatic classification as misclassified germline events will quickly appear to be significant in downstream somatic analysis due to their elevated population frequency at recurrent sites as compared to real somatic events.
- the public dbSNP database can be used to make this distinction.
- the virtual tumor approach begins with a high coverage (60x) whole genome sample sequenced by 1000 Genomes (NA 12878).
- chromosome 20 is focused, as opposed to the entire genome, for computational efficiency.
- the first step is to randomly divide the sequencing data in to several partitions.
- 12 partitions is created from the original 60x data, therefore creating data partitions with ⁇ 5x each.
- This can be accomplished by sorting the BAM by name using SortSam from the Picard (http://picard.sourceforge.net) tools to effectively give the reads random ordering.
- Each read can be randomly allocated to one of the partitions and write it to a partition specific BAM file.
- partitions can be designated as the tumor and others as the normal and process them through the present subject matter. Any somatic mutations identified in this process are false positives as they are either germline events that are mis-sampled in the normal, or erroneous variants due to sequencing noise identified in the partitions designated as tumor. Because the present subject matter can accept multiple BAM files for each the tumor and normal, there is no need to merge the partitions a priori. However, because other methods do not have this capability the individual BAMs can also be merged.
- NA12891 can be used and sequenced to 60x as part of the 1000 Genomes Project. Using the published high confidence genotypes for those samples from the 1000 Genomes Project, a set of sites that are heterozygous in NA 12891 and homozygous for the reference in NA 12878 can be identified.
- SomaticSpike can also be used with MuTect to perform a mixing experiment in-silico.
- this utility attempts to replace a specified fraction of reads drawn from a binomial distribution in the NA12878 data with reads from the NA12891 data therefore simulating a somatic mutation of known location and allele fraction. If there are not enough reads in NA 12891 to replace the desired reads in NA 12878 the site is skipped.
- the output of this process is a BAM with the in-silico variants and a set of locations of those variants.
- the sensitivity is then the probability of observing k or more reads given the allelic fraction and depth.
- aspects of the subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration.
- various implementations of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof.
- ASICs application specific integrated circuits
- These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
- machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
- the machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid state memory or a magnetic hard drive or any equivalent storage medium.
- the machine- readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.
- the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer.
- a display device such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer.
- CTR cathode ray tube
- LCD liquid crystal display
- a keyboard and a pointing device such as for example a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well.
- feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback
- touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.
- the subject matter described herein can be implemented in a computing system that includes a back-end component, such as for example one or more data servers, or that includes a middleware component, such as for example one or more application servers, or that includes a front-end component, such as for example one or more client computers having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described herein, or any combination of such back-end, middleware, or front-end components.
- a client and server are generally, but not exclusively, remote from each other and typically interact through a communication network, although the components of the system can be interconnected by any form or medium of digital data communication.
- Examples of communication networks include, but are not limited to, a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
- LAN local area network
- WAN wide area network
- Internet the Internet
- Proximal Gap HC Remove false positives caused by nearby misaligned small insertion and deletion events. Reject candidate site if there are ⁇ 3 reads with insertions within an 11-bp window centered on the candidate mutation, or if there are > 3 reads with deletions within the same 11-bp window
- Triallelic Site HC Reject false positives caused by calling tri-allelic sites where the normal is heterozygous with alleles A/B and MuTect is considering an alternate allele C. Although this is biologically possible, and remains an area for future improvement in mutation detection, calling at these sites generates many false positives and therefore they are currently filtered out by default. However, it may be desirable to review mutations failing only this filter for biological relevance and orthogonal validation and further study the underlying reasons for these false positives.
- Strand Bias HC Reject false positives caused by context specific sequencing errors where the vast majority of the alternate alleles are observed in a single direction of reads.
- We perform this test by stratifying the reads by direction and then applying the core detection statistic on the two datasets.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne un système, un procédé et un produit de programme informatique permettant de détecter des variants à partir de données de séquençage. Le procédé consiste à : prévoir des données de séquençage alignées, et appliquer des filtres aux données de séquençage alignées ; utiliser les données filtrées en tant que données d'entrée, et appliquer un premier classificateur pour déterminer la présence d'une altération au-delà d'un seuil due à une erreur de séquençage, et identifier des variants candidats ; faire passer les variants candidats identifiés par des filtres supplémentaires afin d'éliminer les faux positifs ; déterminer, au moyen d'un second classificateur, un état somatique des variants candidats identifiés. L'invention concerne également un appareil, des systèmes, des techniques et des articles correspondants.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP13832861.2A EP2891099A4 (fr) | 2012-08-28 | 2013-08-28 | Détection de variants dans des données de séquençage et un étalonnage |
US14/633,321 US20150178445A1 (en) | 2012-08-28 | 2015-02-27 | Detecting variants in sequencing data and benchmarking |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261693987P | 2012-08-28 | 2012-08-28 | |
US61/693,987 | 2012-08-28 | ||
US201361762694P | 2013-02-08 | 2013-02-08 | |
US61/762,694 | 2013-02-08 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/633,321 Continuation-In-Part US20150178445A1 (en) | 2012-08-28 | 2015-02-27 | Detecting variants in sequencing data and benchmarking |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014036167A1 true WO2014036167A1 (fr) | 2014-03-06 |
Family
ID=50184318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2013/057128 WO2014036167A1 (fr) | 2012-08-28 | 2013-08-28 | Détection de variants dans des données de séquençage et un étalonnage |
Country Status (3)
Country | Link |
---|---|
US (1) | US20150178445A1 (fr) |
EP (1) | EP2891099A4 (fr) |
WO (1) | WO2014036167A1 (fr) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015171660A1 (fr) * | 2014-05-05 | 2015-11-12 | Board Of Regents, The University Of Texas System | Outil d'annotation, d'analyse et de sélection de variants |
US20160273049A1 (en) * | 2015-03-16 | 2016-09-22 | Personal Genome Diagnostics, Inc. | Systems and methods for analyzing nucleic acid |
GB2554883A (en) * | 2016-10-11 | 2018-04-18 | Petagene Ltd | System and method for storing and accessing data |
US20200265922A1 (en) * | 2017-10-10 | 2020-08-20 | Nantomics, Llc | Comprehensive Genomic Transcriptomic Tumor-Normal Gene Panel Analysis For Enhanced Precision In Patients With Cancer |
CN114512186A (zh) * | 2022-02-17 | 2022-05-17 | 南京大学 | 一种在植物基因组中检测体细胞突变的方法 |
WO2023113382A1 (fr) * | 2021-12-16 | 2023-06-22 | Genome Insight Technology, Inc. | Procédé et système d'analyse de séquences |
Families Citing this family (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12129514B2 (en) | 2009-04-30 | 2024-10-29 | Molecular Loop Biosolutions, Llc | Methods and compositions for evaluating genetic markers |
JP2012525147A (ja) | 2009-04-30 | 2012-10-22 | グッド スタート ジェネティクス, インコーポレイテッド | 遺伝マーカーを評価するための方法および組成物 |
US9163281B2 (en) | 2010-12-23 | 2015-10-20 | Good Start Genetics, Inc. | Methods for maintaining the integrity and identification of a nucleic acid template in a multiplex sequencing reaction |
WO2013058907A1 (fr) | 2011-10-17 | 2013-04-25 | Good Start Genetics, Inc. | Méthodes d'identification de mutations associées à des maladies |
US8209130B1 (en) | 2012-04-04 | 2012-06-26 | Good Start Genetics, Inc. | Sequence assembly |
US10227635B2 (en) | 2012-04-16 | 2019-03-12 | Molecular Loop Biosolutions, Llc | Capture reactions |
EP2971159B1 (fr) | 2013-03-14 | 2019-05-08 | Molecular Loop Biosolutions, LLC | Procédés d'analyse d'acides nucléiques |
US10851414B2 (en) | 2013-10-18 | 2020-12-01 | Good Start Genetics, Inc. | Methods for determining carrier status |
WO2015175530A1 (fr) | 2014-05-12 | 2015-11-19 | Gore Athurva | Procédés pour la détection d'aneuploïdie |
WO2016040446A1 (fr) | 2014-09-10 | 2016-03-17 | Good Start Genetics, Inc. | Procédés permettant la suppression sélective de séquences non cibles |
WO2016048829A1 (fr) | 2014-09-24 | 2016-03-31 | Good Start Genetics, Inc. | Commande de procédé pour accroître la robustesse de tests génétiques |
EP4095261B1 (fr) | 2015-01-06 | 2025-05-28 | Molecular Loop Biosciences, Inc. | Criblage de variantes structurales |
MA42420A (fr) | 2015-05-13 | 2018-05-23 | Agenus Inc | Vaccins pour le traitement et la prévention du cancer |
US10395759B2 (en) | 2015-05-18 | 2019-08-27 | Regeneron Pharmaceuticals, Inc. | Methods and systems for copy number variant detection |
GB2555551A (en) * | 2015-07-07 | 2018-05-02 | Farsight Genome Systems Inc | Methods and systems for sequencing-based variant detection |
JP6675164B2 (ja) * | 2015-07-28 | 2020-04-01 | 株式会社理研ジェネシス | 変異判定方法、変異判定プログラムおよび記録媒体 |
WO2017136720A1 (fr) * | 2016-02-05 | 2017-08-10 | Good Start Genetics, Inc. | Détection de variants de tests de séquençage |
KR102465122B1 (ko) | 2016-02-12 | 2022-11-09 | 리제너론 파마슈티칼스 인코포레이티드 | 비정상적인 핵형을 검출하기 위한 방법 및 시스템 |
KR101882866B1 (ko) | 2016-05-25 | 2018-08-24 | 삼성전자주식회사 | 시료의 교차 오염 정도를 분석하는 방법 및 장치 |
WO2018144782A1 (fr) * | 2017-02-01 | 2018-08-09 | The Translational Genomics Research Institute | Procédés de détection de variants somatiques et de lignée germinale dans des tumeurs impures |
KR20230152172A (ko) * | 2017-03-19 | 2023-11-02 | 오펙-에슈콜롯 리서치 앤드 디벨롭먼트 엘티디 | K-부정합 검색을 위한 필터를 생성하는 시스템 및 방법 |
WO2019016353A1 (fr) * | 2017-07-21 | 2019-01-24 | F. Hoffmann-La Roche Ag | Classification de mutations somatiques à partir d'un échantillon hétérogène |
KR102035615B1 (ko) * | 2017-08-07 | 2019-10-23 | 연세대학교 산학협력단 | 유전자 패널에 기초한 염기서열의 변이 검출방법 및 이를 이용한 염기서열의 변이 검출 디바이스 |
JP2021503922A (ja) | 2017-11-28 | 2021-02-15 | グレイル, インコーポレイテッドGrail, Inc. | ターゲットシーケンシングのためのモデル |
KR20210009299A (ko) * | 2018-02-27 | 2021-01-26 | 코넬 유니버시티 | 게놈-와이드 통합을 통한 순환 종양 dna의 초민감 검출 |
MA52363A (fr) | 2018-04-26 | 2021-03-03 | Agenus Inc | Compositions peptidiques de liaison à une protéine de choc thermique (hsp) et leurs méthodes d'utilisation |
JP7479367B2 (ja) | 2018-11-29 | 2024-05-08 | ヴェンタナ メディカル システムズ, インク. | 代表的なDNAシーケンシングによる個別化されたctDNA疾患のモニタリング |
JP7340021B2 (ja) | 2018-12-23 | 2023-09-06 | エフ. ホフマン-ラ ロシュ アーゲー | 予測腫瘍遺伝子変異量に基づいた腫瘍分類 |
CN114676229B (zh) * | 2022-04-20 | 2023-01-24 | 国网安徽省电力有限公司滁州供电公司 | 一种技改大修工程档案管理系统及管理方法 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011050341A1 (fr) * | 2009-10-22 | 2011-04-28 | National Center For Genome Resources | Méthodes et systèmes pour l'analyse de séquençage médical |
US20110257896A1 (en) * | 2010-01-07 | 2011-10-20 | Affymetrix, Inc. | Differential Filtering of Genetic Data |
-
2013
- 2013-08-28 EP EP13832861.2A patent/EP2891099A4/fr not_active Withdrawn
- 2013-08-28 WO PCT/US2013/057128 patent/WO2014036167A1/fr unknown
-
2015
- 2015-02-27 US US14/633,321 patent/US20150178445A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011050341A1 (fr) * | 2009-10-22 | 2011-04-28 | National Center For Genome Resources | Méthodes et systèmes pour l'analyse de séquençage médical |
US20110257896A1 (en) * | 2010-01-07 | 2011-10-20 | Affymetrix, Inc. | Differential Filtering of Genetic Data |
Non-Patent Citations (6)
Title |
---|
CIBULSKIS KRISTIAN ET AL.: "Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.", NATURE BIOTECHNOLOGY, vol. 31, no. 3, 2013, pages 213 - 219, XP055256219 * |
DEISBOECK THOMAS S. ET AL.: "Advancing Cancer Systems Biology: Introducing the Center for the Development of a Virtual Tumor, CViT", CANCER INFORMATICS, vol. 5, 2007, pages 1 - 8, XP055256215 * |
HENG LI.: "A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.", BIOINFORMATICS, vol. 27, no. 21, 2011, pages 2987 - 2993, XP055256214 * |
PENG ZHIYU ET AL.: "Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome.", NATURE BIOTECHNOLOGY, vol. 30, no. 3, 2012, pages 253 - 260, XP055110036 * |
See also references of EP2891099A4 * |
ZHANG ZHENGDONG D. ET AL.: "Identification of genomic indels and structural variations using split reads.", BMC GENOMICS, vol. 12, no. 375, 2011, pages 1 - 12, XP021104728, Retrieved from the Internet <URL:http://www.biomedcentral.com/1471-2164/12/375> * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2541143A (en) * | 2014-05-05 | 2017-02-08 | Univ Texas | Variant annotation, analysis and selection tool |
WO2015171660A1 (fr) * | 2014-05-05 | 2015-11-12 | Board Of Regents, The University Of Texas System | Outil d'annotation, d'analyse et de sélection de variants |
US20180119230A1 (en) * | 2015-03-16 | 2018-05-03 | Personal Genome Diagnostics, Inc. | Systems and methods for analyzing nucleic acid |
WO2016149261A1 (fr) | 2015-03-16 | 2016-09-22 | Personal Genome Diagnostics, Inc. | Systèmes et procédés pour analyser l'acide nucléique |
CN107750279A (zh) * | 2015-03-16 | 2018-03-02 | 个人基因组诊断公司 | 核酸分析系统和方法 |
US20160273049A1 (en) * | 2015-03-16 | 2016-09-22 | Personal Genome Diagnostics, Inc. | Systems and methods for analyzing nucleic acid |
EP3271848A4 (fr) * | 2015-03-16 | 2018-12-05 | Personal Genome Diagnostics Inc. | Systèmes et procédés pour analyser l'acide nucléique |
GB2554883A (en) * | 2016-10-11 | 2018-04-18 | Petagene Ltd | System and method for storing and accessing data |
US11176103B2 (en) | 2016-10-11 | 2021-11-16 | Petagene Ltd | System and method for storing and accessing data |
US20200265922A1 (en) * | 2017-10-10 | 2020-08-20 | Nantomics, Llc | Comprehensive Genomic Transcriptomic Tumor-Normal Gene Panel Analysis For Enhanced Precision In Patients With Cancer |
WO2023113382A1 (fr) * | 2021-12-16 | 2023-06-22 | Genome Insight Technology, Inc. | Procédé et système d'analyse de séquences |
CN114512186A (zh) * | 2022-02-17 | 2022-05-17 | 南京大学 | 一种在植物基因组中检测体细胞突变的方法 |
CN114512186B (zh) * | 2022-02-17 | 2025-02-11 | 南京大学 | 一种在植物基因组中检测体细胞突变的方法 |
Also Published As
Publication number | Publication date |
---|---|
EP2891099A4 (fr) | 2016-04-20 |
US20150178445A1 (en) | 2015-06-25 |
EP2891099A1 (fr) | 2015-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014036167A1 (fr) | Détection de variants dans des données de séquençage et un étalonnage | |
Cibulskis et al. | Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples | |
US20210012859A1 (en) | Method For Determining Genotypes in Regions of High Homology | |
Li | Toward better understanding of artifacts in variant calling from high-coverage samples | |
US11961589B2 (en) | Models for targeted sequencing | |
US10734117B2 (en) | Apparatuses and methods for determining a patient's response to multiple cancer drugs | |
US20240105282A1 (en) | Methods for detecting bialllic loss of function in next-generation sequencing genomic data | |
EP4070318A1 (fr) | Systèmes et procédés d'automatisation d'appels d'expression d'arn dans un pipeline de prédiction de cancer | |
US11842794B2 (en) | Variant calling in single molecule sequencing using a convolutional neural network | |
CN106462670A (zh) | 超深度测序中的罕见变体召集 | |
Witt et al. | Apportioning archaic variants among modern populations | |
WO2018150378A1 (fr) | Détection de contamination croisée dans des données de séquençage à l'aide de techniques de régression | |
Besedina et al. | Copy number losses of oncogenes and gains of tumor suppressor genes generate common driver mutations | |
US20200013484A1 (en) | Machine learning variant source assignment | |
Scheffler et al. | Somatic small-variant calling methods in Illumina DRAGEN™ Secondary Analysis | |
CN118632935A (zh) | 检测无细胞rna中的交叉污染 | |
US20200105374A1 (en) | Mixture model for targeted sequencing | |
CN113195741A (zh) | 从循环核酸中鉴定全基因组序列数据中的全局序列特征 | |
do Nascimento et al. | Copy number variations detection: unravelling the problem in tangible aspects | |
Narzisi et al. | Lancet: genome-wide somatic variant calling using localized colored DeBruijn graphs | |
Maruzani et al. | Predicting high confidence ctDNA somatic variants with ensemble machine learning models | |
Zhao et al. | UVC: universality-based calling of small variants using | |
Swenson | Detection of artefacts in FFPE-sample sequence data | |
Wang | Transcriptome and genome analysis based on alignment-free protocols | |
Derryberry | Benchmarking of single nucleotide somatic variant calling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13832861 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |