CN109680060A - Methylate marker and its application in diagnosing tumor, classification - Google Patents
Methylate marker and its application in diagnosing tumor, classification Download PDFInfo
- Publication number
- CN109680060A CN109680060A CN201710962721.4A CN201710962721A CN109680060A CN 109680060 A CN109680060 A CN 109680060A CN 201710962721 A CN201710962721 A CN 201710962721A CN 109680060 A CN109680060 A CN 109680060A
- Authority
- CN
- China
- Prior art keywords
- methylation
- cancer
- cpg
- marker
- tumour
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Hospice & Palliative Care (AREA)
- Biophysics (AREA)
- Oncology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a kind of methylation markers for being used for early diagnosis of tumor, the marker includes the corresponding island CpG of 248 probes in HumanMethylation450BeadChip Kit, and the marker can be good at distinguishing the normal tissue by the tumor tissues of tumour in training set and cancer.The machine learning model constructed using the methylation level of the marker can not only be used to the risk of clinically 12 kind tumour of the prediction with training set, it equally also can be applied to other tumours, also performance is good in other tumours, and area under the curve reaches 0.94;Prove that the methylation level on the island CpG that the present invention screens can be used in the early diagnosis of nearly all TCGA tumour.The present invention also provides identification of the corresponding island CpG of 515 methylation markers for staging and tumour classification in HumanMethylation450BeadChip Kit.
Description
Technical field
The invention belongs to fields of biomedicine, and in particular to methylation marker and its answering in diagnosing tumor, classification
With.
Background technique
Currently, the early diagnosis of tumour faces the challenge.For gene expression dose, DNA methylation level is more steady
It is fixed, and can be from the methylation level and circulation for detecting DNA in the circulating tumor cell being free in blood in human peripheral
The methylation level of Tumour DNA, therefore DNA methylation can be used as the marker of diagnosing tumor.Because the methylation of DNA is one
A reversible process, therefore the methylation of DNA can also be used as the target spot of drug.
The relevant methylation analysis of the cancer carried out before has ignored base primarily directed to the island CpG of gene promoter area
Because of the abnormal methylation in other regions other than promoter region and the island CpG and the relationship of tumour, moreover, they are analyzing
When HumanMethylation450K chip data, mainly as unit of gene, only consider that those can be matched to base
Because of the island CpG above promoter region, those probes for being located at intergenic region are had ignored, thus can holiday information.In this hair
Before bright, also there is researcher to be trained one or 4 tumours, the early stage for picking out some markers for tumour examines
It is disconnected, still, they select marker be merely capable of playing a role in one in training set or 4 tumours, and
It does not prove in other tumours that those markers are applicable to other than training set.
Summary of the invention
In order to overcome the disadvantages described above of the prior art, the present invention provides a set of feature selectings as tumor markers
Method, the specific steps are as follows:
(1) as unit of the island CpG, differential methylation analysis is carried out.The island CpG of all Human450K chips of the present invention
(including non-start up sub-district) carries out subsequent analysis.
(2) then using the tumour of normal-tumour paired sample number certain amount as training set.
(3) pass through XGBoost and recursive feature elimination algorithm, comprehensively consider the accuracy rate of clinical cost and prediction, select
The importance island CpG in the top is as feature out.
In step (2), the sample number of the preferred tumour of the present invention is greater than 10.In a specific embodiment, the present invention selects
12 kinds of tumours are selected as training set.12 kinds of tumours are as follows: Urothelial Carcinoma of Bladder (BLCA), breast invasive carcinoma (BRCA),
Colon cancer (COAD), the cancer of the esophagus (ESCA), clear cell carcinoma of kidney (KIRC), renal papilla shape clear-cell carcinoma (KIRP), liver cancer
(LIHC), adenocarcinoma of lung (LUAD), lung squamous cancer (LUSC), cancer of pancreas (PAAD), prostate cancer (PRAD), thyroid cancer (THCA).
In step (3), the island CpG is the HumanMethylation450 BeadChip of Infinium company in table 4
The corresponding island CpG of 248 probes in Kit.Preferably, in step (3), the present invention picks out 7 islands CpG as feature;Institute
7 probes stated in the HumanMethylation450 BeadChip Kit that 7 islands CpG are Infinium company respectively correspond
The island CpG, 7 probes include cg10995381, cg27019093, cg04396850, cg25351606,
Cg02064267, cg20300343, cg13974761;Its location information of corresponding island CpG in the genome is shown in Table 1,
The number of the probe in table 4 is 5,2,1,15,12,4,72;Table 4 is that 248 all potential tumours screened are examined
Disconnected marker.
Application the present invention also provides the island CpG as the methylation marker of early diagnosis of tumor, the island CpG
It is corresponding for 248 probes in the HumanMethylation450 BeadChip Kit of Infinium company in table 4
The island CpG.The present invention is preferably using 7 islands CpG as the application of the methylation marker of early diagnosis of tumor;7 islands CpG
For the corresponding island CpG of 7 probes in the HumanMethylation450 BeadChip Kit of Infinium company, institute
Stating 7 probes includes cg10995381, cg27019093, cg04396850, cg25351606, cg02064267,
Cg20300343, cg13974761, the corresponding island CpG are shown in Table 1.Wherein, the tumour is preferably 12 kinds of tumours of people, packet
Include: Urothelial Carcinoma of Bladder (BLCA), breast invasive carcinoma (BRCA), colon cancer (COAD), the cancer of the esophagus (ESCA), kidney are transparent thin
Born of the same parents' cancer (KIRC), renal papilla shape clear-cell carcinoma (KIRP), liver cancer (LIHC), adenocarcinoma of lung (LUAD), lung squamous cancer (LUSC), cancer of pancreas
(PAAD), prostate cancer (PRAD), thyroid cancer (THCA).
In the present invention, the bioprocess and signal path that the island CpG can influence indirectly be cell division, DNA replication dna,
The similar process and access such as cell cycle.3 islands CpG especially therein, the bioprocess and letter that can be influenced indirectly
Number access is the similar process and access such as cell division, DNA replication dna, cell cycle.Wherein, 3 islands CpG refer to
This corresponding island CpG of 3 probes cg04396850, cg25351606 and cg20300343, number difference in table 4
It is 1,15,4.
In the present invention, the corresponding island CpG of 248 probes can be not only used as in training set in the table 4
12 kinds of tumours marker, the marker of other tumours is also used as, because in other 9 kinds of tumours except training set
Performance it is also relatively good.Other tumours include: Cervix Squamous Cell cancer and interior neck gland cancer (CESC), cholangiocarcinoma (CHOL),
Glioblastoma multiforme (GBM), pheochromocytoma and Chromaffionoma (PCPG), the carcinoma of the rectum (READ), sarcoma (SARC),
Cutaneous melanoma (SKCM), sdenocarcinoma of stomach (STAD), thymoma (THYM).Preferably, the island CpG is 7 probes in table 1
The corresponding island CpG.
The present invention also provides a kind of Logic Regression Models, utilize the corresponding island CpG of 248 probes in the table 4
Methylation data construction logic regression model, call under sklearn (version 0.18.1) in linear_model
LogisticRegression function, parameter are default parameters.The Logic Regression Models can successfully will be in training set
Tumour and normal tissue distinguish, also performance is good in three individual authentication data sets.Preferably, the island CpG is preferably
The corresponding island CpG of 7 probes in table 1.
In the present invention, the Logic Regression Models (disaggregated model) can be by the tumor tissues of 12 kinds of tumours in training set
It distinguishes, the normal tissue by the tumor tissues of other tumours and cancer can also be distinguished with the normal tissue by cancer, because
It is also showed in the remaining several tumours of TCGA relatively good.Other tumours include: Cervix Squamous Cell cancer and interior neck gland cancer
(CESC), cholangiocarcinoma (CHOL), glioblastoma multiforme (GBM), pheochromocytoma and Chromaffionoma (PCPG), rectum
Cancer (READ), sarcoma (SARC), cutaneous melanoma (SKCM), sdenocarcinoma of stomach (STAD), thymoma (THYM).Preferably, described
The island CpG is the corresponding island CpG of 7 probes in table 1.
Application the present invention also provides the island CpG as the methylation marker for distinguishing different tumor types, it is described
The island CpG is that 515 probes in table 5 in the HumanMethylation450BeadChip Kit of Infinium company are right respectively
The island Ying CpG.Preferably, the island CpG is the HumanMethylation450BeadChip of Infinium company in table 2
The corresponding island CpG of 12 probes in Kit, the number of affiliated 12 probes in table 5 are as follows: 83,5,114,158,203,
21,12,2,22,18,66,23;Table 5 is all 515 potential for distinguishing the methylation marker of different tumours.
In the present invention, the methylation level on the island CpG can pass through methylation chip or methylation status of PTEN promoter
Or the technologies such as genomic methylation sequencing measure.Wherein, the island CpG is that the probe in table 1-2,4-5 is corresponding
The island CpG.
The present invention also provides a kind of kits, wherein the kit includes the first measured in the table 1,2,4 or 5
The reagent of one or more DNA methylation levels of base marker.
The present invention also provides a kind of infantile tumour diagnostic products based on human blood, wherein the infantile tumour is examined
Stopping pregnancy product include the examination for measuring methylate one of marker or a variety of DNA methylation levels in the table 1,2,4 or 5
Agent.
The present invention also provides a kind of genetic chips, wherein the genetic chip includes measuring the table 1,2,4 or 5
In methylation one of marker or a variety of DNA methylations level reagent.
The present invention also provides human cancer correlation methylation marker or the kit or the tumour are early
The application in product that phase diagnostic products or the genetic chip early diagnose tumour in preparation, separates;It is described swollen
Tumor is preferably human tumour, comprising: Urothelial Carcinoma of Bladder (BLCA), breast invasive carcinoma (BRCA), colon cancer (COAD), oesophagus
Cancer (ESCA), clear cell carcinoma of kidney (KIRC), renal papilla shape clear-cell carcinoma (KIRP), liver cancer (LIHC), adenocarcinoma of lung (LUAD), lung
Squamous carcinoma (LUSC), cancer of pancreas (PAAD), prostate cancer (PRAD), thyroid cancer (THCA), Cervix Squamous Cell cancer and interior neck gland
It is cancer (CESC), cholangiocarcinoma (CHOL), glioblastoma multiforme (GBM), pheochromocytoma and Chromaffionoma (PCPG), straight
Intestinal cancer (READ), sarcoma (SARC), cutaneous melanoma (SKCM), sdenocarcinoma of stomach (STAD), thymoma (THYM), adrenal cortex
Cancer, virtuous chromophobe cell tumor, acute myeloid leukaemia, rudimentary glioma, serous cystadenocarcinoma of ovary.
The present invention also provides the screening techniques of cancer diagnosis methylation marker, comprising the following steps:
(1) data collection: DNA methylation data (the including but not limited to methylation chip, full genome of kinds cancer is collected
Group methylation sequencing or other methylation data etc.;In the specific embodiment of the present invention, methylation core is used
Piece HumanMethylation450K data).The certain amount that is greater than for taking normal tissue and tumour to match (preferably at least has 10
It is right) normal-tumor sample tumour as training set;Other normal-tumours are (as preferably matched logarithm less than 10 but being greater than 1
Tumour) as independent validation data set or other common data sets (methyl in such as GEO or other public databases
Change data set) it can also be used as independent validation data set.
(2) data processing (by taking the chip data that methylates as an example)
A) comparison is deleted to the probe (island CpG) on sex chromosome.
B) delete be more than 30% sample in all there is the probes of missing values.
C) probe that all there are missing values in 30% sample is deleted.
D) remaining missing values carry out complement value with the EMimpute_array algorithm in LSimpute software.
E) error of the second class probe is corrected using BMIQ software.
F) β value greater than 1 is set as 1, and the β value less than 0 is set as 0.
(3) definition (by taking the chip data that methylates as an example) on the island differential methylation CpG:
A) it calculates △ β: the △ β of each probe (island CpG) and is equal in all tumor samples the average value of β value and all normal
The difference of the average value of β value in the sample of tissue.
B) it calculates p value and it is corrected: using the t checking computation p value of pairing, with the side Benjamini/Hochberg
Method corrects p value.
C) CpG island of the p value less than 0.05 after taking the absolute value of △ β to be greater than 0.2 and correct is the CpG of differential methylation
Island.
(4) feature selecting: training set is trained with the XGBoost software package in Python, obtains each feature
Importance marking deletes minimum feature of giving a mark, remaining feature is continued to train, comprehensively considers clinic in each cycle
The accuracy rate of cost and prediction, the most important feature of certain amount before final choice methylate as the cancer diagnosis and indicate
Object, and be used to construct machine learning model with these islands CpG;The certain amount can be 7~20 (at of the invention one
In specific embodiment, 7 most important features before final choice, and be used to construct machine learning model with this 7 islands CpG).
(5) construct model: the present invention constructs a Logic Regression Models, area under trained receiver operating curves
It (AUC) be the accuracys rate of 0.99,5 times of cross validations is 0.97.The AUC verified on three GEO data sets respectively reaches 0.97,
0.95,0.93, illustrate the model that the present invention constructs is repeatable.
(6) bioprocess and signal path enrichment analysis: the present invention first selects those mRNA expressions and above-mentioned 4
The methylation level on the island CpG is used there are the gene of strong correlation (absolute value of spearman related coefficient is greater than 0.5)
DAVID carries out enrichment analysis, the methylation water of discovery and tri- islands CpG cg04396850, cg25351606 and cg20300343
The flat stronger gene of correlation has mainly been enriched to the biologies such as cell division, mitosis, sister chromosome cohesion, DNA replication dna
The signal paths such as process and cell cycle, 7 markers for illustrating that the present invention screens are really related to the increment of tumour, it
Methylation variation be tumour cell general character.
(7) it is applied to other tumours: illustrates that select 7 markers of institute can not only be used to distinguish to verify this
The normal and cancerous tissue of 12 kinds of tumours in training set, can also be used to distinguish other tumours normal other than training set and
Above-mentioned model is applied to other tumours by tumor tissues.The results show that above-mentioned model remains unchanged in the tumour other than training set
Performance is good, and AUC has reached 0.94, illustrates that 7 markers that this explanation screens serve not only as 12 kinds of tumours of training set
Marker, the also marker as other tumours can be used for the early diagnosis of tumour.
The present invention also provides a kind of screening techniques of staging methylation marker, comprising the following steps:
(1) by the DNA methylation data of the tumour patient of kinds cancer (including but not limited to methylation chip data,
Full-length genome methylation sequencing data or other methylation data etc.) it is merged together.
By taking the chip data that methylates as an example, the DNA methyl chip data can derive from but be not limited to TCGA data set.
(2) variance is screened: deleting probe (or CpG island) of the variance (std) less than 0.2 in all samples.
(3) differential methylation screens: in every kind of cancer, select it is in the top several in the cancer and remaining cancer
The probe (mean difference>0.3&FDR<0.01) of differential methylation.The island CpG is filtered out, subsequent further screening is given over to, it is described
The island CpG that the island CpG can well distinguish different tumours.Wherein, several described probes can be with considering cost
It is determined with accuracy rate;Such as in a specific embodiment of the invention, in the top 20 are selected in the cancer and residue
The probe of differential methylation in cancer.
(4) recursive feature is eliminated: using recursive feature removing method (xgboost algorithm) 12 probes of further screening
(island CpG), and the more disaggregated models of OneVsRestClassifier are constructed with the methylation level on the corresponding island CpG of the probe
It can be good at every kind of tumour and other tumours distinguishing (table 3).
(5) verify in a separate data set: the present invention is with the common data sets in GEO: breast cancer (GSE69914), knot
The carcinoma of the rectum (GSE48684), prostate cancer (GSE73549), oophoroma (GSE65820), adenocarcinoma of lung (GSE66836) and liver cancer
(GSE89852) result (table 3) verified in also very well, illustrates more disaggregated models on the present invention select island CpG and building
Repeatability and validity.The independent data sets are also possible to other sources, the related neoplasms data generated including oneself
Collection.
The beneficial effects of the present invention are:
The present invention carries out differential methylation analysis as unit of the island CpG.Before this, most of researchs are solely focused on gene and open
The methylation level on the sub-area island CpG, and the present invention is from the islands CpG of all Human450K chips (including non-start up
Area) carry out subsequent analysis.
The machine learning model of the methylation level building on this 7 islands CpG of the present invention can be used to clinically predict
Suffer from the risk of tumour, the area under the curve of Receiver operating curve when training is up to 0.99 or more, independent at three
The area under the curve that verify data is concentrated also has respectively reached 0.97,0.95 and 0.93.
Analysis is enriched with by bioprocess and signal path to find, it is stronger with the methylation level presence on wherein 3 islands CpG
The major function of the gene of correlation is the biological functions and cell cycle etc. such as cell division, sister chromosome separation, DNA replication dna
Signal path shows that the exception of the methylation level of this 3 markers is the general character of tumour.
The machine learning model constructed using the methylation level of the marker can not only be used to clinically predict
The risk for suffering from 12 kinds of tumours of training set, equally also can be applied to other tumours, also show in remaining several tumours good
Good, area under the curve reaches 0.94, it was demonstrated that the island CpG that the present invention screens can be used in the diagnosis of the early stage of nearly all tumour.
The present invention has also picked out 12 islands CpG, and for distinguishing different tumor types, average out to when training is bent
Area also shows good (area reaches 0.95 under averaged curve) in individual authentication data set to 0.97 or more under line.
Heretofore described methylation marker can be used for the early diagnosis of tumour, and it is swollen to detect the circulation dissociated in blood
The methylation level of tumor DNA (ctDNA), can be used for early diagnosis, the classification of tumour, to predict that testee suffers from tumour, suffers from
The probability of which kind of tumour.
1. 7, table are used for the island the CpG information of diagnosing tumor
2. 12, table for distinguishing the island the CpG information of different tumours
Table 3. is used to distinguish the 26 kinds of trained tumours and its AUC of different tumor types
Table 4. can be used for preceding 248 islands CpG information of diagnosing tumor classifier
Table 5. can be used for distinguishing preceding 515 islands CpG information of different staging devices
Detailed description of the invention
Fig. 1: a. receiver operating curves' (ROC curve) in training concentration training;Learning curve when b. training;c,
D, e are respectively the ROC curve verified in three individual authentication data sets GSE76938, GSE48684 and GSE37754;F. exist
In training set, the dendrogram obtained with the methylation level of above 7 markers, corresponding probe is followed successively by from left to right
Cg10995381, cg27019093, cg04396850, cg25351606, cg02064267, cg20300343, cg13974761;
All samples can be divided into tumour and normal tissue two major classes by above 7 markers, and in figure, top half is tumor tissues,
Lower half portion is normal tissue;In right block tumor type be followed successively by from top to bottom BLCA, BRCA, COAD, ESCA, KIRC,
KIRP,LIHC,LUAD,LUSC,PAAD,PRAD,THCA;In right block, organization type is from top to bottom successively are as follows: normal group
It knits, tumor tissues;G. the ROC curve verified in the remaining 9 kinds of tumours of TCGA.
Specific embodiment
In conjunction with following specific embodiments and attached drawing, the present invention is described in further detail.Implement process of the invention,
Condition, experimental method etc. are among the general principles and common general knowledge in the art, this hair in addition to what is specifically mentioned below
It is bright that there are no special restrictions to content.
Embodiment 1
For the methylation marker of diagnosis, first with 12 kinds of tumours (tumor tissues and normal tissue pairing) in TCGA
Sample carries out feature selecting, the specific steps are as follows:
(1) it is trained with XGBoost, parameter is to carry out 5 times of intersections with the GridSearchCV function in sklearn to test
Card is automatic to adjust ginseng, and the importance marking of each feature (methylation level of probe (island CpG)) is obtained after training, minimum is deleted and beats
Divide corresponding feature.
(2) data for forming remaining feature repeat step (1), until remaining next feature.
(3) since number of features excessively causes clinical detection cost excessively high, number of features causes under predictablity rate very little
Drop, comprehensively considers, in the marking of selected characteristic importance before ranking 7 the island CpG, the marker as diagnosing tumor.
(4) with the methylation level on above-mentioned 7 islands CpG, construction logic regression model, tumor sample to 12 kinds of tumours and
The sample of Carcinoma side normal tissue is trained, the accuracy rate of trained ROC curve and 5 times of cross validations such as Fig. 1 a, b.
(5) above-mentioned marker is examined in three individual authentication data sets (GSE69914, GSE48684 and GSE76938)
With classifier (such as Fig. 1 c, d, e), and (such as Fig. 1 g) is further examined in remaining 9 kinds of other tumours.
(6) non-supervisory hierarchical clustering is carried out with the methylation level on above-mentioned 7 islands CpG, it can be seen that this 7 markers
It can be very good for all samples to be divided into tumor sample and normal sample two major classes (such as Fig. 1 f).
In the present embodiment, the sample of the tumor tissues and normal tissue pairing can also derive from other data sets.
In the present embodiment, the tumour also can according to need actual selection.
Embodiment 2
For distinguishing the classifier of different tumours, first with the tumor sample of TCGA tumour, analyzed in the steps below:
In the present embodiment, the tumor sample can also derive from other data sets.
In the present embodiment, the tumour also can according to need actual selection.
(1) the methyl chip data of the tumour patient of 26 kinds of cancers is merged together, including 7605 patient's samples;Its
In, the relevant information of 26 kinds of cancers is shown in Table 3.
(2) variance is screened: deleting the probe of variance (std) less than 0.2 in all samples.
(3) differential methylation screens: in every kind of cancer, selecting first 20 differences in the cancer and remaining 25 kinds of cancers
The probe (mean difference>0.3&FDR<0.01) of methylation.515 islands CpG are filtered out altogether, give over to subsequent further screening.
This 515 islands CpG can well distinguish different tumours.
(4) recursive feature is eliminated: carrying out recursive feature elimination (methylation of the specific method with diagnosis with xgboost algorithm
The method of xgboost progress feature selecting is the same in marker screening), 12 methylation markers are filtered out, with above-mentioned 12
Corresponding methylation level building OneVsRestClassifier (function in sklearn, use default parameters) in the island CpG is more
Disaggregated model can be good at every kind of tumour and other tumours distinguishing (table 3, AUC > 0.8).
(5) it verifies in a separate data set: in GEO common data sets: breast cancer (GSE69914), colorectal cancer
(GSE48684), prostate cancer (GSE73549), oophoroma (GSE65820), adenocarcinoma of lung (GSE66836) and liver cancer
(GSE89852) result (table 3) verified in also very well, illustrates more classification on the present invention select 12 islands CpG and building
The repeatability and validity of model.
Wherein, independent data sets are also possible to other sources, the data set generated including oneself.
Protection content of the invention is not limited to above embodiments.Without departing from the spirit and scope of the invention, this field skill
Art personnel it is conceivable that variation and advantage be all included in the present invention, and using appended claims as protection scope.
Claims (13)
1. being used for the methylation marker of early diagnosis of tumor, which is characterized in that the methylation marker is that number is in table 4
The arbitrary one kind or combinations thereof on 1~248 corresponding island CpG of 248 probes.
2. being used for the methylation marker of early diagnosis of tumor, which is characterized in that the methylation marker is 7 spies in table 1
The arbitrary one kind or combinations thereof on the corresponding island CpG of needle;7 probes are respectively cg10995381, cg27019093,
Cg04396850, cg25351606, cg02064267, cg20300343, cg13974761.
3. the reagent of detection methylation marker as claimed in claim 1 or 2 is in the product for preparing early diagnosis of tumor
Using.
4. the methylation marker for distinguishing different tumor types, which is characterized in that the methylation marker is to compile in table 5
Number for 1~515 the corresponding island CpG of 515 probes arbitrary one kind or combinations thereof.
5. the methylation marker for distinguishing different tumor types, which is characterized in that the methylation marker is 12 in table 2
The arbitrary one kind or combinations thereof on the corresponding island CpG of a probe.
6. detection methylates as described in claim 4 or 5, the reagent of marker is in preparation for distinguishing different tumor types
Application in product.
7. a kind of clinical diagnosis product, which is characterized in that the clinical diagnosis product include the island CpG described in detection table 4 it
The reagent of the methylation level of arbitrary one kind or combinations thereof.
8. a kind of clinical diagnosis product, which is characterized in that the clinical diagnosis product include the island CpG described in detection table 5 it
The reagent of the methylation level of arbitrary one kind or combinations thereof.
9. methylation marker as claimed in claim 1 or 2, or application as described in claim 4 or 5, which is characterized in that
The tumour includes: Urothelial Carcinoma of Bladder, breast invasive carcinoma, colon cancer, the cancer of the esophagus, clear cell carcinoma of kidney, renal papilla shape
Clear-cell carcinoma, liver cancer, adenocarcinoma of lung, lung squamous cancer, cancer of pancreas, prostate cancer, thyroid cancer, Cervix Squamous Cell cancer, interior neck gland cancer,
Cholangiocarcinoma, glioblastoma multiforme, pheochromocytoma and Chromaffionoma, the carcinoma of the rectum, sarcoma, cutaneous melanoma, stomach
Gland cancer, thymoma, adrenocortical carcinoma, virtuous chromophobe cell tumor, acute myeloid leukaemia, rudimentary glioma and ovarian serous
Cystadenocarcinoma.
10. a kind of method of the feature selecting as tumor markers, which is characterized in that the described method comprises the following steps:
(1) as unit of the island CpG, differential methylation analysis of markers is carried out;
(2) normal-tumour paired sample number is then greater than the tumour of certain amount as training set;
(3) pass through XGBoost and recursive feature elimination algorithm, comprehensively consider the accuracy rate of clinical cost and prediction, pick out ratio
The more important island CpG in the top is as feature.
11. a kind of Logic Regression Models, which is characterized in that utilize the methylation on the corresponding island CpG of 248 probes in table 4
Data construction logic regression model calls the LogisticRegression function under sklearn in linear_model, parameter
It is default parameters, constructs the Logic Regression Models.
12. a kind of screening technique of cancer diagnosis methylation marker, which comprises the following steps:
(1) data collection: the Tumour DNA methylation data containing normal tissue and tumour with logarithm greater than 10 are collected, as instruction
Practice collection;And it collects another part data and collects as independent verifying;
(2) data processing
A) comparison is deleted to the probe on sex chromosome;
B) delete be more than 30% sample in all there is the probes of missing values;
C) probe that all there are missing values in 30% sample is deleted;
D) remaining missing values carry out complement value with EMimpute_array algorithm;
E) error of the second class probe is corrected using BMIQ software;
F) β value greater than 1 is set as 1, and the β value less than 0 is set as 0;
(3) definition on the island differential methylation CpG:
A) △ β: the △ β of each probe sample for being equal to the average value of β value and all normal tissues in all tumor samples is calculated
The difference of the average value of middle β value;
B) it calculates p value and it is corrected: using the t checking computation p value of pairing, with Benjamini/Hochberg method pair
P value is corrected;
C) CpG island of the p value less than 0.05 after taking the absolute value of △ β to be greater than 0.2 and correct is the island CpG of differential methylation;
(4) feature selecting: training set is trained with the XGBoost software package in Python, obtains the important of each feature
Property marking delete the minimum feature of giving a mark in each cycle, remaining feature is continued to train, if until remaining last
Dry feature obtains the cancer diagnosis methylation marker.
13. a kind of screening technique of staging methylation marker, which is characterized in that the described method comprises the following steps:
(1) the methylation data of tumour patient are merged together;
(2) variance is screened: deleting probe of the variance less than 0.2 in all samples;
(3) differential methylation screens: in every kind of cancer, selecting several differences in the cancer and remaining cancer in the top
The island CpG of methylation, mean difference>0.3&FDR<0.01;Filter out the CpG that can well distinguish different tumours
Island.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710962721.4A CN109680060A (en) | 2017-10-17 | 2017-10-17 | Methylate marker and its application in diagnosing tumor, classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710962721.4A CN109680060A (en) | 2017-10-17 | 2017-10-17 | Methylate marker and its application in diagnosing tumor, classification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109680060A true CN109680060A (en) | 2019-04-26 |
Family
ID=66182751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710962721.4A Pending CN109680060A (en) | 2017-10-17 | 2017-10-17 | Methylate marker and its application in diagnosing tumor, classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109680060A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110310700A (en) * | 2019-07-02 | 2019-10-08 | 河海大学常州校区 | DNA methylation chip mark site selection method based on deep learning model |
WO2020012367A3 (en) * | 2018-07-09 | 2020-04-09 | Hkg Epitherapeutics Limited | Dna methylation markers for noninvasive detection of cancer and uses thereof |
CN111370129A (en) * | 2020-04-20 | 2020-07-03 | 上海鹍远生物技术有限公司 | Thyroid tumor benign and malignant identification model and application thereof |
CN112779334A (en) * | 2021-02-01 | 2021-05-11 | 杭州医学院 | Methylation marker combination for early screening of prostate cancer and screening method |
CN112852969A (en) * | 2021-04-19 | 2021-05-28 | 温州医科大学 | Epigenetically modified lncRNA as tumor diagnosis or tumor progression prediction marker |
CN112941180A (en) * | 2021-02-25 | 2021-06-11 | 浙江大学医学院附属妇产科医院 | Group of lung cancer DNA methylation molecular markers and application thereof in preparation of lung cancer early diagnosis kit |
CN113817822A (en) * | 2020-06-19 | 2021-12-21 | 中国医学科学院肿瘤医院 | Tumor diagnosis kit based on methylation detection and application thereof |
CN114606316A (en) * | 2022-03-12 | 2022-06-10 | 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) | Construction method of model for early diagnosis and prognosis prediction of NK/T cell lymphoma |
WO2022142677A1 (en) * | 2020-12-30 | 2022-07-07 | 上海奕谱生物科技有限公司 | Tumor marker and application thereof |
WO2022190752A1 (en) * | 2021-03-12 | 2022-09-15 | 富士フイルム株式会社 | Cancer test reagent set, method for producing cancer test reagent set, and cancer test method |
CN115424666A (en) * | 2022-09-13 | 2022-12-02 | 江苏先声医学诊断有限公司 | Method and system for screening pan-cancer early-screening molecular marker based on whole genome bisulfite sequencing data |
CN116004836A (en) * | 2023-02-11 | 2023-04-25 | 广东医科大学 | Application of reagent for preparing target gene of novel coronavirus in regulation of tumorigenesis and prognosis |
CN116042837A (en) * | 2023-02-23 | 2023-05-02 | 深圳市海普洛斯生物科技有限公司 | Methylation marker combination for detecting urinary system cancer species and screening method thereof |
WO2023078283A1 (en) * | 2021-11-04 | 2023-05-11 | 广州市基准医疗有限责任公司 | Methylation biomarker for breast cancer diagnosis and use thereof |
CN117079723A (en) * | 2023-10-13 | 2023-11-17 | 北京大学第三医院(北京大学第三临床医学院) | Biomarker and diagnostic model related to amyotrophic lateral sclerosis and application of biomarker and diagnostic model |
CN117594133A (en) * | 2024-01-19 | 2024-02-23 | 普瑞基准科技(北京)有限公司 | Screening method of biomarker for distinguishing uterine lesion type and application thereof |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002228000A1 (en) * | 2000-12-07 | 2002-06-18 | Europroteome Ag | Expert system for classification and prediction of genetic diseases |
US20020192686A1 (en) * | 2001-03-26 | 2002-12-19 | Peter Adorjan | Method for epigenetic feature selection |
US20100297641A1 (en) * | 2000-10-27 | 2010-11-25 | University Of Southern California | Methylation altered dna sequences as markers associated with human cancer |
CN102089654A (en) * | 2008-03-10 | 2011-06-08 | 莱恩进公司 | COPD biomarker signatures |
CN106980774A (en) * | 2017-03-29 | 2017-07-25 | 电子科技大学 | A kind of extended method of DNA methylation chip data |
CN107034295A (en) * | 2017-06-05 | 2017-08-11 | 天津医科大学肿瘤医院 | For early diagnosis of cancer and the DNA methylation index of Hazard degree assessment and its application |
-
2017
- 2017-10-17 CN CN201710962721.4A patent/CN109680060A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100297641A1 (en) * | 2000-10-27 | 2010-11-25 | University Of Southern California | Methylation altered dna sequences as markers associated with human cancer |
AU2002228000A1 (en) * | 2000-12-07 | 2002-06-18 | Europroteome Ag | Expert system for classification and prediction of genetic diseases |
US20020192686A1 (en) * | 2001-03-26 | 2002-12-19 | Peter Adorjan | Method for epigenetic feature selection |
CN102089654A (en) * | 2008-03-10 | 2011-06-08 | 莱恩进公司 | COPD biomarker signatures |
CN106980774A (en) * | 2017-03-29 | 2017-07-25 | 电子科技大学 | A kind of extended method of DNA methylation chip data |
CN107034295A (en) * | 2017-06-05 | 2017-08-11 | 天津医科大学肿瘤医院 | For early diagnosis of cancer and the DNA methylation index of Hazard degree assessment and its application |
Non-Patent Citations (2)
Title |
---|
BABAJIDE MUSTAPHA I, SAEED F: ""Bioactive Molecule Prediction Using Extreme Gradient Boosting"", 《MOLECULES》 * |
YANG X, GAO L, ZHANG S.: "Comparative pan-cancer DNA methylation analysis reveals cancer common and specific patterns", 《BRIEF BIOINFORM.》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020012367A3 (en) * | 2018-07-09 | 2020-04-09 | Hkg Epitherapeutics Limited | Dna methylation markers for noninvasive detection of cancer and uses thereof |
CN110310700B (en) * | 2019-07-02 | 2022-09-13 | 河海大学常州校区 | DNA methylation chip mark site screening method based on deep learning model |
CN110310700A (en) * | 2019-07-02 | 2019-10-08 | 河海大学常州校区 | DNA methylation chip mark site selection method based on deep learning model |
CN111370129A (en) * | 2020-04-20 | 2020-07-03 | 上海鹍远生物技术有限公司 | Thyroid tumor benign and malignant identification model and application thereof |
CN111370129B (en) * | 2020-04-20 | 2021-06-08 | 上海鹍远生物技术有限公司 | Thyroid tumor benign and malignant identification model and application thereof |
CN113817822A (en) * | 2020-06-19 | 2021-12-21 | 中国医学科学院肿瘤医院 | Tumor diagnosis kit based on methylation detection and application thereof |
CN113817822B (en) * | 2020-06-19 | 2024-02-13 | 中国医学科学院肿瘤医院 | Tumor diagnosis kit based on methylation detection and application thereof |
WO2022142677A1 (en) * | 2020-12-30 | 2022-07-07 | 上海奕谱生物科技有限公司 | Tumor marker and application thereof |
CN112779334B (en) * | 2021-02-01 | 2022-05-27 | 杭州医学院 | Methylation marker combination for early screening of prostate cancer and screening method |
CN112779334A (en) * | 2021-02-01 | 2021-05-11 | 杭州医学院 | Methylation marker combination for early screening of prostate cancer and screening method |
CN112941180A (en) * | 2021-02-25 | 2021-06-11 | 浙江大学医学院附属妇产科医院 | Group of lung cancer DNA methylation molecular markers and application thereof in preparation of lung cancer early diagnosis kit |
WO2022190752A1 (en) * | 2021-03-12 | 2022-09-15 | 富士フイルム株式会社 | Cancer test reagent set, method for producing cancer test reagent set, and cancer test method |
CN112852969A (en) * | 2021-04-19 | 2021-05-28 | 温州医科大学 | Epigenetically modified lncRNA as tumor diagnosis or tumor progression prediction marker |
WO2023078283A1 (en) * | 2021-11-04 | 2023-05-11 | 广州市基准医疗有限责任公司 | Methylation biomarker for breast cancer diagnosis and use thereof |
CN114606316A (en) * | 2022-03-12 | 2022-06-10 | 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) | Construction method of model for early diagnosis and prognosis prediction of NK/T cell lymphoma |
CN115424666A (en) * | 2022-09-13 | 2022-12-02 | 江苏先声医学诊断有限公司 | Method and system for screening pan-cancer early-screening molecular marker based on whole genome bisulfite sequencing data |
CN116004836A (en) * | 2023-02-11 | 2023-04-25 | 广东医科大学 | Application of reagent for preparing target gene of novel coronavirus in regulation of tumorigenesis and prognosis |
CN116042837A (en) * | 2023-02-23 | 2023-05-02 | 深圳市海普洛斯生物科技有限公司 | Methylation marker combination for detecting urinary system cancer species and screening method thereof |
CN117079723A (en) * | 2023-10-13 | 2023-11-17 | 北京大学第三医院(北京大学第三临床医学院) | Biomarker and diagnostic model related to amyotrophic lateral sclerosis and application of biomarker and diagnostic model |
CN117079723B (en) * | 2023-10-13 | 2024-02-02 | 北京大学第三医院(北京大学第三临床医学院) | Biomarker and diagnostic model related to amyotrophic lateral sclerosis and application of biomarker and diagnostic model |
CN117594133A (en) * | 2024-01-19 | 2024-02-23 | 普瑞基准科技(北京)有限公司 | Screening method of biomarker for distinguishing uterine lesion type and application thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109680060A (en) | Methylate marker and its application in diagnosing tumor, classification | |
CN105219844B (en) | Gene marker combination, kit and the disease risks prediction model of a kind of a kind of disease of screening ten | |
CN106047998A (en) | Detection method and application of lung cancer genes | |
CN105506115B (en) | DNA library for detecting and diagnosing genetic cardiomyopathy pathogenic genes and application thereof | |
CN111833963B (en) | CfDNA classification method, device and application | |
Li et al. | A meta-analysis of the effect of microRNA-34a on the progression and prognosis of gastric cancer. | |
CN107423578A (en) | Detect the device of somatic mutation | |
CN110060733A (en) | Second-generation sequencing tumor somatic variation detection device based on single sample | |
CN115424666B (en) | Method and system for screening early-stage screening sub-markers of pan-cancer based on whole genome bisulfite sequencing data | |
CN109423519A (en) | Early pancreatic carcinoma marker and its detection method | |
CN109616198A (en) | It is only used for the choosing method of the special DNA methylation assay Sites Combination of the single cancer kind screening of liver cancer | |
CN105603101A (en) | Application of system for detecting expression quantity of eight miRNAs in preparation of product for diagnosing or assisting in diagnosing hepatocellular carcinoma | |
CN113257360B (en) | Cancer screening model, and construction method and construction device of cancer screening model | |
CN107435062A (en) | Screen good pernicious peripheral blood gene marker of small pulmonary nodules and application thereof | |
CN108531597A (en) | A kind of detection kit for oral squamous cell carcinomas early diagnosis | |
JP6134809B2 (en) | Cancer diagnostic equipment | |
CN106295244B (en) | Screening method of tumor diagnosis marker, breast cancer lung metastasis related gene obtained by method and application of breast cancer lung metastasis related gene | |
CN108949979A (en) | A method of judging that Lung neoplasm is good pernicious by blood sample | |
CN104818322B (en) | MiRNA and Cyfra21 1 combine the application in detection non-small cell lung cancer | |
KR101223270B1 (en) | Method for determining low―mass ions to screen colorectal cancer, method for providing information to screen colorectal cancer by using low―mass ions, and operational unit therefor | |
CN107273717A (en) | A kind of detection model of Sera of Lung Cancer gene and its construction method and application | |
KR102217272B1 (en) | Extracting method of disease diagnosis biomarkers using mutation information in whole genome sequence | |
CN109735619A (en) | Molecular marker relevant to non-small cell lung cancer prognosis and its application | |
CN106636440B (en) | Blood plasma microRNAs is used to prepare the purposes of the diagnostic reagent of patients with lung adenocarcinoma in sieving and diagnosis male population | |
CN108070656A (en) | Lung cancer marker and its application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190426 |