CN108424970B

CN108424970B - Biomarkers and methods for detecting risk of cancer recurrence

Info

Publication number: CN108424970B
Application number: CN201810606265.4A
Authority: CN
Inventors: 唐大木; 何立智; 陈争; 陈婧; 赵坤成; 赵凤娟; 曾永柯; 马靖翔
Original assignee: Shenzhen Yikang Biological Technology Co ltd
Current assignee: Shenzhen Muguang Biotechnology Co ltd
Priority date: 2018-06-13
Filing date: 2018-06-13
Publication date: 2021-04-13
Anticipated expiration: 2038-06-13
Also published as: WO2019237641A1; CN108424970A; CN112941184A

Abstract

The invention relates to a biomarker for detecting the recurrence risk of cancer and a detection method. Wherein the biomarker for detecting the risk of cancer recurrence comprises at least one of the following genes: SLCO2A1, CGNL1, SUPV3L1, TATDN2, MGAT4B, VAV2, SLC25A33, MCCC1, ASNS, CASKIN1, DNMT3B, AURKA, OIP5, CTHRC1, and GOLGA 7B. The biomarker for detecting the risk of cancer recurrence of the present invention has an advantage of effectively predicting the risk of recurrence of cancer such as prostate cancer.

Description

Biomarkers and methods for detecting risk of cancer recurrence

Technical Field

The invention relates to a prostate cancer detection technology, in particular to a biomarker for detecting cancer recurrence risk and a method for detecting cancer recurrence risk.

Background

Prostate cancer is the most common malignant tumor in men in developed countries, and the incidence of prostate cancer is also rapidly rising in china. The progression of different prostate cancers varies widely. Although the long-term prognosis for most low-scoring tumors (Gleason score less than 6/WHO grade (group) I) is good, there is approximately 30% cancer recurrence after radical prostatectomy. The main indicator of such cancer recurrence is elevation of serum Prostate Specific Antigen (PSA), also known as biochemical recurrence. Elevated serum prostate-specific antigens are indicative of a high risk of cancer metastasis. The standard therapy for the treatment of metastatic prostate cancer is androgen ablation therapy (ADT), a palliative treatment. Recurrence of prostate cancer following failure of prostate cancer castration therapy is also called Castration Resistant Prostate Cancer (CRPC). The biochemical relapse phase of high prostate specific antigen is a critical phase for the development of early targeted therapies. It is therefore highly desirable to follow-up the risk of biochemical recurrence of prostate cancer.

Currently, there are three multigenomic detection kits based on mRNA expression for diagnosing Prostate cancer recurrence on the market, which include Oncotype DX (Genomic State Score/GPS), Prolaris (cell cycle progression/CCP), and Decipher (Genomic Classifier/GC). After definitive diagnosis and radical surgery of prostate cancer, 17-gene Oncotype DX and 31-gene Prolaris kits help stratify risk groups for patients at high risk of prostate cancer recurrence. 22-Gene Decipher can predict the risk of metastasis after radical treatment. While these biomarkers help in designing personalized prostate cancer treatment plans, their clinical value needs further validation. Despite the significant advances in the study of prostate cancer biomarkers, there is still a substantial lack of risk assessment of recurrence after radical prostate cancer therapy and the stratification of prostate cancer patients based thereon. Part of the reason may be that the molecular biological network mechanisms that contribute to prostate cancer are quite complex.

The Mucin 1(MUC1) pathway plays an important role in the biochemical recurrence after radical prostatectomy. MUC1 is a tumor-associated antigen that has been extensively studied, in part because the cell membrane glycoprotein MUC1 is expressed on the surface of the apical end of most epithelial tissues. In 70% of cancers, glycosylation of MUC1 appears to be altered. MUC1 contributes to the progression of tumor progression in many tumors by activating important oncogenic proteins of various pathways, including EGFR, β -Catenin, NF- κ B, and PKM 2. In prostate cancer, MUC1 expression is up-regulated and aberrant glycosylation occurs. These abnormalities are associated with angiogenesis and adverse clinical symptoms. Up-regulation of MUC1 correlates weakly with shortening of disease-free survival (DFS) and Overall Survival (OS), and with malignant histopathology following radical prostate cancer therapy. Three genomes (AZGP1, MUC1 and p53) were associated with poor prognosis in patients with primary prostate cancer. Metastatic prostate cancer can detect increased expression of MUC1 mRNA. Changes in the genome of the MUC1 gene network of the 25 gene were slightly associated with prostate cancer recurrence.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: based on the genome of the MUC1 gene network of 25 genes, the biomarker for detecting the recurrence risk of the cancer is provided, and the recurrence risk of the cancers such as prostate cancer and the like can be effectively predicted; and provides a method for detecting the recurrence risk of cancer based on the biomarkers.

In order to solve the technical problems, the invention adopts the technical scheme that:

a biomarker for detecting the risk of cancer recurrence comprising at least one of the 696 differentially expressed genes of table 1. Table 1 is table 1 in the specification.

A combination of genes for detecting the risk of cancer recurrence comprising at least one of the following genes: SLCO2A1, CGNL1, SUPV3L1, TATDN2, MGAT4B, VAV2, SLC25A33, MCCC1, ASNS, CASKIN1, DNMT3B, AURKA, OIP5, CTHRC1, and GOLGA 7B.

A combination of genes for detecting the risk of cancer recurrence comprising at least one of the following sets of characteristic genomes: SigCut1, SigCut2, SigCut3, and SigMuc1NW 1;

the SigCut1 includes the following genes: MGAT4B, AURKA and OIP 5;

the SigCut2 includes the following genes: TATDN2, MGAT4B, VAV2, AURKA, and OIP 5;

the SigCut3 includes the following genes: SLCO2A1, CGNL1, SUPV3L1, TATDN2, MGAT4B, VAV2, SLC25A33, MCCC1, ASNS, CASKIN1, DNMT3B, AURKA, OIP5, CTHRC1, and GOLGA 7B;

the SigMuc1NW1 includes the following genes: CGNL1, MGAT4B, VAV2, ASNS, CASKIN1, DNMT3B, AURKA, OIP5, CTHRC1, and GOLGA 7B.

A method for detecting the risk of cancer recurrence, which comprises examining the expression changes of the genes in the above-mentioned gene combination to diagnose or predict the risk of death of a patient.

Among them, PCR, DNA chip, Nanostring or RAN sequencing method can be used to check the low mRNA expression and high mRNA expression of the genes in the biomarkers.

Wherein the subject of the method is a human or a mammal.

The invention has the beneficial effects that:

the biomarker for detecting the risk of cancer recurrence of the present invention makes full use of the potential value of a novel biomarker that is MUC1, and develops an effective characteristic genome to predict the recurrence of cancer such as prostate cancer. The invention constructs 15 characteristic genomes and a plurality of subgroups. These characteristic genomes are very effective in predicting prostate cancer recurrence following radical prostate cancer therapy in two independent prostate cancer databases (n 492 and n 140). In addition, these characteristic genomes are closely associated with disease-free survival and a reduction in overall survival for a variety of other cancer types.

Drawings

FIG. 1 shows the strategy for generating the characteristic genomes of the present technology patent;

FIGS. 2A-B show the selective covariate analysis of 696 genes using the Elastic-net method;

FIG. 3 shows the gene expression of the characteristic genome (SigMuc1NW) of the selected 15 genes;

FIGS. 4A-B show that SigMuc1NW correlates with decreased disease-free survival (DFS) and Overall Survival (OS) in prostate cancer patients;

FIG. 5 shows the overlap between the characteristic genome of our previously reported 9-gene [21] and SigMuc1 NW;

FIGS. 6A-C show that the two characteristic genomes of FIG. 5 are significantly correlated with a reduction in disease-free survival (DFS) and Overall Survival (OS) for prostate cancer patients;

figures 7A-D show that the score of SigMuc1NW can effectively stratify and group prostate cancers with high risk of recurrence;

FIG. 8 shows the estimated cut-off points for SigMuc1NW scores;

FIG. 9 shows that all 15 genes of SigMuc1NW are significantly associated with prostate cancer recurrence and the presence of three acquired sub-signature genomes;

FIGS. 10A-E show that SigCut1, SigCut2 and SigCut3 correlate significantly with a reduction in disease-free survival (DFS);

figures 11A-C show that SigMuc1NW scores effectively stratify and group high risk of recurrence prostate cancer;

FIGS. 12A-C show changes in the expression of component genes in another prostate cancer population independent of the TCGA database;

figures 13A-E show that SigMuc1NW1 is very effective in predicting prostate cancer recurrence in the database of figure 12;

FIGS. 14A-B show that SigMuc1NW1 significantly correlated with a reduction in disease-free survival (DFS) and Overall Survival (OS) for TCGA sub-database prostate cancer patients.

Detailed Description

In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.

On the basis that the genome change of the MUC1 gene network based on 25 genes is slightly related to the recurrence of prostate cancer, the inventors demonstrated that the genome change of 9 genes in the 25 genes can significantly enhance the relationship. The present invention aims to exploit the potential value of MUC1 as a novel biomarker to develop an effective set of signature genes to predict recurrence of prostate cancer. In innovative work, the inventors found 696 Differentially Expressed Genes (DEGs) in the TCGA clinical database of prostate cancer from cBioPortal, which are associated with a genome characteristic of the 9-gene. The inventors further constructed 15 characteristic genomes and multiple subgroups from these differentially expressed genes. These characteristic genomes are very effective in predicting prostate cancer recurrence following radical prostate cancer therapy in two independent prostate cancer databases (n 492 and n 140). In addition, these characteristic genomes are also closely associated with disease-free survival and a reduction in overall survival for a variety of other cancer types.

The present invention obtains 696 differentially expressed genes associated with the genome characteristic of the 9-gene MUC1 from the TCGA databank (n-492) of prostate cancer. The effect of all genes that appeared changed on prostate cancer recurrence was analyzed using Elastic-net logistic regression. By analysis, 416 down-regulated genes with an effect of less than 1.5SD (standard deviation) of the mean and 280 up-regulated genes with an effect of more than 2SD of the mean were selected. From this analysis, a genome containing 15 genes characteristic of 696 gene (SigMuc1NW) was obtained, namely: SLCO2A1, CGNL1, SUPV3L1, TATDN2, MGAT4B, VAV2, SLC25A33, MCCC1, ASNS, CASKIN1, DNMT3B, AURKA, OIP5, CTHRC1, and GOLGA 7B. From the SigMuc1NW signature genome, the inventors further grouped four sub-signature genomes, namely: SigCut1, SigCut2, SigCut3, and SigMuc1NW 1.

SigMuc1NW can strongly predict biochemical recurrence after radical treatment, with sensitivity of 56.4% and specificity of 72.6%. Median disease-free stage (MMDF) was 63.24 months for SigMuc1 NW-positive prostate cancer patients, whereas median disease-free stage was significantly longer for SigMuc1 NW-negative prostate cancer patients than for positive patients, and no effective median disease-free stage was obtained even at the end of the 160-month follow-up period (p 1.12 e-12). The time-dependent AUC (area under the curve, tAUC) values of SigMuc1NW at 11.5 months were 76.6%, 73.8% at 22.3 months, 78.5% at 32.1 months, and 76.4% at 48.4 months. SigMuc1NW is associated with malignancy characteristics, high Gleason score (odds ratio/OR1.48, p <2e-16), and advanced tumor stage (OR 1.33, p ═ 4.37e-13) for prostate cancer.

SigCut1 (including MGAT4B, AURKA and OIP5 genes) was used to distinguish between recurrent and non-recurrent prostate cancer values of tAUC at 74.3% at 11.5 months, 73.8% at 22.3 months, 78.5% at 32.1 months and 76.4% at 48.4 months, respectively. The median disease-free stage of SigCut 1-positive prostate cancer was 69.1 months, and SigCut 1-negative prostate cancer did not reach an effective median disease-free stage at the end of the follow-up (p ═ 4.8 e-10).

SigCut2 (including TATDN2, MGAT4B, VAV2, AURKA and OIP5 genes) were used to distinguish between recurrent and non-recurrent prostate cancer values for tAUC: 75.9% at 11.5 months, 73.4% at 22.3 months, 76.5% at 32.1 months and 75.3% at 48.4 months. The median disease-free stage of SigCut 2-positive prostate cancer was 32.5 months, and SigCut 2-negative prostate cancer did not reach an effective median disease-free stage at the end of the follow-up (p ═ 0).

SigCut3 includes all 15 component genes of SigMuc1 NW. Based on the mRNA expression of these genes at the cut points, SigCut3 predicted 67% sensitivity and 75.7% specificity in prostate cancer recurrence. Median disease-free stage for SigCut 3-positive prostate cancer and SigCut 3-negative prostate cancer was 45.2 months and did not reach effective median disease-free stage (p ═ 0), respectively. SigCut3 used to distinguish between recurrent and non-recurrent prostate cancer had a tAUC value of 76.5% at 11.5 months, 73.8% at 22.3 months, 78.5% at 32.1 months, and 76.4% at 48.4 months.

SigMuc1NW1 consists of 10 genes CGNL1, MGAT4B, VAV2, ASNS, CASKIN1, DNMT3B, AURKA, OIP5, CTHRC1 and GOLGA 7B. SigMuc1NW1 was used to predict recurrence of prostate cancer (n 492) in the TCGA database. In this group of prostate cancers, the median disease-free stage of SigMuc1NW1 positivity was 60.91 months, and negative prostate cancers did not reach an effective median disease-free stage at the end of follow-up (p ═ 3.14 e-12). Another database of prostate cancers (MSKCC, n 140) was predicted using SigMuc1NW1, with a median disease-free period of SigMuc1NW1 positive at 11.8 months, while negative prostate cancers did not reach an effective median disease-free period at the end of the follow-up (p 3.11 e-15). SigMuc1NW1 used to distinguish between recurrent and non-recurrent prostate cancer had a tAUC value of 82.5% at 18.4 months, 78.5% at 38 months, 76.6% at 51.4 months and 78.2% at 65 months.

After adjustments for known clinical high risk factors including age, Gleason grade, surgical margin tumor residual and tumor stage, SigMuc1NW1 and SigCut3 are independent risk factors for predicting post-operative recurrence of prostate cancer after adjustment.

SigMuc1NW and SigMuc1NW1 are associated with shortening of Overall Survival (OS) for multiple cancer types. See in particular the following:

breast cancer: curtis database (n 1980, p 0.00447/SigMuc1NW and p 0.000575/SigMuc1NW1) and TCGA sub-database (n 1093, p 0.022 and p 0.0586);

low grade glioma (n-516, p-4.92 e-5 and p-0.000191);

head and neck squamous cell carcinoma (n 520, p 0.0368/SigMuc1NW 1);

clear cell renal cell carcinoma (ccRCC, n-533, p-1.58 e-5 and p-2 e-7);

papillary renal cell carcinoma (pRCC, n 290, p 0.00289 and p 1.172 e-5);

hepatocellular carcinoma (n 371, p 0.048 and p 0.0349);

sarcomas (n 259, p 0.00813 and p 0.000851);

thyroid cancer (n 501, p 1.01e-5 and p 0.000742);

endometrial cancer of the uterus (n 177, p 0.0244 and p 0.026).

SigMuc1NW and SigMuc1NW1 are associated with disease-free survival (DFS) shortening for multiple cancer types. See in particular the following:

low grade glioma (n: 516, p: 00183 and p: 0.00511);

clear cell renal cell carcinoma (n-533, p-0.00485, and p-0.000264);

papillary renal cell carcinoma (n 290, p 0.00379/SigMuc1NW 1);

renal chromophobe carcinoma (n ═ 66, p 0.03 and p ═ 0.0235);

hepatocellular carcinoma (n 371, p 0.0105 and p 0.00813);

sarcomas (n 259, p 0.000768 and p 1.1 e-5);

cutaneous melanoma (n 469, p 0.00566/SigMuc1NW 1);

thyroid cancer (n 501, p 0.0506/SigMuc1NW 1).

The invention diagnoses and assesses the possible risk of the patient relapse after radical prostate cancer therapy by checking the 15 genes and the variation of different subgenomic groups (SigMuc1NW1, Sigcut1, Sigcut2 and Sigcut3) in the SigMuc1NW characteristic genome. It can be used for diagnosing and evaluating the death risk of prostate cancer patients, the recurrence risk in the initial diagnosis of prostate cancer, and the risk of metastasis and the progression to castration-resistant prostate cancer (CRPC) after radical treatment.

The 15 genes in the SigMuc1 NW-characteristic genome may be used in different combinations, that is, combinations other than the above-described SigMuc1NW1, SigCut1, SigCut2, and SigCut3 may also be used. This is because all 15 genes alone can predict biological recurrence of cancer.

The invention is illustrated in detail in figures 1 to 14 of the accompanying drawings as follows:

in FIG. 1, the TCGA sub-database in cBioPortal stores the gene expression of 492 prostate cancer patients, which was obtained by RNA sequencing. Based on the characteristic genome of the 9-genes derived from the MUC1 regulated genomic network, we first divided the population into two groups: one set (n ═ 100) was positive for the characteristic genome of the 9-gene and the other (n ═ 392) was negative. From these two groups we obtained 696 Differentially Expressed Genes (DEGs) based on their mean mRNA expression (q < 0.001). These differentially expressed genes consist of 461 down-regulated genes and 218 up-regulated genes. Tumors were classified into groups with-461 downregulated gene expression levels below the mean 1.5SD (standard deviation) (-1.5SD) -. Tumors were grouped into prostate cancer-up (+2SD) -with-218 upregulated genes with this characteristic expressed higher than the overall mean 2 SD. We then used the Elastic-netpenalty method in the R glmnet package for model construction. Then, the influence of the 696 Differentially Expressed Genes (DEGs) on the biochemical recurrence of prostate cancer was analyzed by regularized coupled covariate selection with this model.

In fig. 2, the mixing parameter α is set to a cross-validation (CV) curve of 0.2(a) and 0.8 (B). The number of non-zero coefficients (covariates) for the current lambda value (parameter value adjusted by setting penalty level) is shown at the top of the graph. The rightmost vertical line represents the minimum of the CV curve, and the vertical line on the left thereof shows the CV-error within one standard deviation of the minimum. The model is built on the lambda values shown as the left vertical lines.

In figure 3, using the system described in figures 1 and 2, we selected these 15 genes from 696 genes, based on their effect on prostate cancer recurrence. In TCGA database prostate cancers, SLCP2A1 and CGNL1 were down-regulated by 1.5 standard deviations (-1.5SD) and the remaining genes were up-regulated by 2 standard deviations (+2 SD). The expression of these Differentially Expressed Genes (DEG) in prostate cancer of the TCGA database was shown using OncoPrint (top gray panel) and clustering (bottom color image). This figure also includes a disease free state. The graph was generated using the tools provided by cbioport.

In FIG. 4, a TCGA sub-database was used for these analyses. (A) The effect of SigMuc1NW on DFS. MDF: number of months without illness; MMDF (MMDF): the median disease-free period; NA: no effective MMDF was reached. The graph includes the number of at-risk individuals over a specified follow-up period. (B) The effect of SigMuc1NW on the OS. MMS: median survival rate of months. Kaplan-Meier analysis and log rank test were performed using the survivval package of R.

In fig. 5, the graph is generated using a TCGA sub-database (n 492, cbioport). This figure shows only the genes that are altered in both characteristic genomes.

In fig. 6, the analysis was performed using a TCGA sub-database. (A, B) the genome characteristic of the 9-gene used to construct SigMuc1NW was combined with SigMuc1NW and then examined for their effect on DFS (A) and OS (B). (C) The effect of the gene set on DFS, characteristic of our previously identified 9-genes, using the TCGA database.

In fig. 7, (a) SigMuc1NW scores were performed on all tumors in the TCGA sub-database. tROC assay scores were used to identify tumors with high risk of recurrence. The figure shows the auc (tauc) over a specified time period and the status of disease recurrence. DF: disease-free state. (B) SigMuc1NW score cut point (cutpoint) can be effective to separate prostate cancer with low risk of recurrence from prostate cancer with high risk of recurrence (see details of figure 8). A binary code is then assigned to the lesion based on the tangent point. Finally, the effect of the demarcation point on disease-free survival (DFS) of TCGA cohort patients was determined. (C, D) Effect of SigMuc1NW mean and Q3 score of prostate cancer patients on biochemical recurrence of prostate cancer in TCGA sub-databases. Kaplan-Meier analysis and log rank test were performed using the Rsurvivval package. The vertical dotted line shows median disease-free phase. The color point curve is the 95% confidence interval (confidence interval).

In fig. 8, SigMuc1NW scores were performed for all 492 patients in the TCGA sub-database. The cut-point value estimation is done using the maximum selection level statistics (Maxstat software package) in R. The vertical dashed line shows the demarcation point and its associated p-value.

In FIG. 9, mRNA expression data for 15 genes were obtained from a TCGA sub-database (cBioPortal) to obtain individual cut-off values, and binary codes were provided for all tumors. The univariate Cox Proportional Hazards (PH) profile was used to determine the risk ratio for prostate cancer recurrence (HR) for all genes. Cox proportional hazards hypothesis were also evaluated and confirmed. These analyses were performed using the Rsurvival package. The graph includes the risk ratio, 95% CI and p values. From the p-values, we also obtained the genes contained in the signature genomes SigCut1, SigCut2 and SigCut 3.

In FIG. 10, the TCGA sub-database is used here. (A) All tumors were scored using Cox coefficients for SigCut1, SigCut2, and SigCut 3. This figure shows the time-dependent AUC and corresponding recurrence status of the three characteristic genomes over the follow-up time period. (B-D) SigCut1, SigCut2, and SigCut3 were associated with biochemical relapse. (E) Q1, median, Cutpoint and Q3 scores of SigCut3 were analyzed for prostate cancer stratification groups with high risk of recurrence. Including the number of at-risk individuals over a specified follow-up period. Kaplan-Meier analysis and log rank test were performed using the Rsurvivval package.

In figure 11, hierarchical analysis of prostate cancer at high biochemical recurrence risk in TCGA sub-databases was performed based on SigMuc1NW scores for Q1, median and Q3 values.

In FIG. 12, gene expression data for all 15-component genes were obtained in the MSKCC sub-database within cBioPortal. The gene expression data for this population was obtained from a DNA gene chip. mRNA levels in normal and prostate cancer tissue mRNA, (a), primary and metastatic prostate cancer mRNA, (b), and non-recurrent and recurrent prostate cancer (C). The figure also indicates the number of carcinomas in each group. Statistical analysis was performed using the Student's t test (double test). P <0.05, p <0.01 and p < 0.001.

The data followed up in FIG. 13 and the mRNA expression data for all 15 genes were obtained from the MSKCC database. SigMuc1NW1 contains 10 genes. This figure shows the time-dependent auc (a) obtained. High risk of recurrence prostate cancer was stratified using the cut points (B), Q1(C), median (D) and Q3(E) of SigCut1NW 1. The number of prostate cancers during the current follow-up is also shown in the figure.

In FIG. 14, SigMuc1NW1 was significantly associated with a reduction in DFS and OS in TCGA clinical cohort prostate cancer patients, with SigMuc1NW1 gene expression based on SD levels. Kaplan-Meier analysis and log rank test were performed using the tool supplied by cBioPortal.

The following detailed description of the technical solution of the present invention includes:

1. genes closely related to the characteristic genome of the 9-gene

Biochemical recurrence (BCR) occurs in 30-40% of patients after radical prostate cancer surgery; of which approximately 40% present with metastatic cancer. The assessment of the risk of biochemical relapse will help in the development of personalized treatment regimens. We have recently constructed a 9-gene signature genome derived from the molecular biological network of the MUC1 gene; this signature genome effectively predicts biochemical recurrence using the TCGA sub-database: sensitivity 34.8%, specificity 83.6%, median disease free period (MMDF)73.36 months (p ═ 5.57 e-5). Biochemical recurrence is the result of multigenic, multi-channel mutations. In this regard, the inventors obtained a more efficient characteristic genome by analyzing the changes in transcriptome associated with the characteristic genome of the 9-gene. To investigate this possibility, the inventors performed an analysis of the TCGA sub-database in the cbioport database using the strategy in fig. 1. The inventors analyzed gene transcription closely related to the genome characteristic of the 9-gene. Of 492 prostate cancer patients, 100 were positive for the characteristic genome (figure 1). Comparing the average expression of genes between these 100 positive prostate cancers and the other 392 negative cancers, we obtained a total of 696 Differentially Expressed Genes (DEGs), (q <0.001) (table 1, table 1 shows Differentially Expressed Genes (DEGs) for the 9-gene signature genome in the TCGA sub-database). These differentially expressed genes included 416 down-regulated genes and 280 up-regulated genes (FIG. 1; Table 1). Enrichment analysis of these differentially expressed genes using the KEGG (KEGG. set.hs) dataset in the RGaga package revealed that the up-regulated genes were mainly involved in regulating cell cycle, oocyte meiosis and progesterone-mediated oocyte maturation, while genes mediating cell junctions and other functions were down-regulated. Similarly, using Gene Ontology (go, go. sets.hs) dataset analysis, upregulated Gene function has been implicated in including regulation of cell cycle progression, DNA metabolism and other processes associated with cell proliferation. Downregulated gene function is involved in mediating cell junctions, extracellular processes and other cellular processes. Enrichment analysis of the gene channels of 696 differentially expressed genes using the Reactome software package by R showed that these genes regulate the G1, M, DNA replication and chromatid segregation pathways of the cell cycle. Taken together, the above analysis revealed 696 differentially expressed genes associated with the progression of prostate cancer.

TABLE 1

2. 15-Gene signature genome SigMuc1NW was constructed to predict biochemical recurrence (BCR) after Radical Prostatectomy (RP).

We subsequently analyzed the effect of 696 Differentially Expressed Genes (DEG) on biochemical recurrence using the TCGA sub-database. In the population of cancer types described above, the genome was further acquired from 696 individuals using direct covariate selections. These patients were post radical prostatectomy (cbioport). Based on the diversity of prostate cancer, we reasoned that when the expression of these differentially expressed genes exceeds a threshold level, these differentially expressed genes may affect the biochemical recurrence of the cancer. We grouped prostate cancers whose differentially expressed gene expression corresponds to a mean of the reference population down-regulated by 1.5 standard deviations (-1.5 SD); the corresponding differentially expressed genes were grouped into groups that were upregulated by more than 2 standard deviations (+2SD) (figure 1). The reference population was all tumors in the dataset or tumors with intact diploids (http:// www.cbioportal.org/faq. jsp). This rearranged database was then analyzed using regularized covariate selection using Elastic-net logistic regression in the R glmnet software package (figure 1), containing data for down-regulated genes, up-regulated genes, follow-up period and relapse status for each patient. To adjust the number of highly correlated covariates selected and minimized, we set the mixing parameter a in the Elastic-net analysis to 0.2 or 0.8. 10 fold cross validation was used in all selection settings. As expected, more covariates were selected at α ═ 0.2(n ═ 17) than α ═ 0.8(n ═ 5) (fig. 2). We also made covariate selection with a different setting (s ═ 0.5), which resulted in more covariates than the setting α ═ 0.2. Then, we removed the correlation coefficient <0.01 in the s-0.5 setting and the correlation coefficient <0.001 differentially expressed genes in the α -0.2 setting. Thus we obtained a characteristic genome (NW refers to the network) comprising 15 genes SigMuc1 NW. This genome includes all 5 genes selected from α ═ 0.8; 14 genes selected from α ═ 0.2 (this also included all 5 genes selected at α ═ 0.8) and 15 genes at s ═ 0.5 (including all 14 genes selected at α ═ 0.2) (table 2).

TABLE 2

Wherein, a is-1.5 SD down-regulation gene; 2SD up-regulated gene; NA, unusable.

Of the 15 genes, only the SLCO2A1 and CGNL1 genes were down-regulated, with the remaining genes up-regulated (Table 1). The role of the five genes CGNL1, SUPV3L1, TATDN2, CASKIN1 and GOLGA7B in prostate cancer tumorigenesis or tumorigenesis has not been reported (table 2). 6 genes (SLCO2A1, MGAT4B, SLC25A33, MCCC1, OPI5 and CTHRC1) were reported to affect tumorigenesis of other cancer types, but not prostate cancer (Table 2). OIP5(Opa interacting protein 5) is a cancer testis antigen, and has been reported to be a Tumor Associated Antigen (TAA) in other cancer types. Its aberrant expression in prostate cancer indicates that OIP5 is also likely to be a tumor-associated antigen for prostate cancer. The remaining 4 genes, VAV2(VAV guanine nucleotide exchange factor 2), asns (antisense synthesis), DNMT3B (dnamethyl transferase 3beta) and aurka (aurora kinase a), not only promote prostate carcinogenesis, but also function in cancer progression CRPC Castration Resistant Prostate Cancer (CRPC). VAV2 is a co-activator of Androgen Receptor (AR), maintaining androgen receptor signaling after castration therapy (ADT). It also promotes angiogenesis and metastasis. AURKA plays an important role in mitosis and promotes the development of neuroendocrine prostate cancer following castration therapy. DNMT3B may modulate epigenetic events to promote the progression of castration-resistant prostate cancer (CRPC). In summary, all evidence supports the association of the signature muc1NW signature genome with prostate cancer recurrence.

Consistent with this, univariate Cox Proportional Hazards (PH) analysis showed that all component genes at defined expression levels (-1.5SD down-regulation and +2SD up-regulation) were very effective in predicting biochemical recurrence of prostate cancer (table 2, table 2 for the genetic component of SigMuc1NW and recurrence of prostate cancer)^a). With the exception of TATDN2 and OIP5, the PH hypothesis of the Cox model was confirmed. Predictions for certain genes (MGAT4B, ASNS, DNMT3B and OIP5) were efficient (table 3), especially considering that these predictions were based on a single gene.

TABLE 3

Wherein, a is univariate Cox analysis in TCGA sub-database (n is 492); b is Cox coefficient; c, risk ratio; d, a confidence interval; e, relative to a reference population, the average gene expression is < -1.5 SD; gene expression averages >2SD relative to the reference population. P < 0.05; p < 0.01; p < 0.001.

To support our selection of related genes, the 15 gene changes showed overlapping features (FIG. 3, top panel), and their expression could be clustered together (FIG. 3, bottom panel). Based on down/up-regulation changes relative to Standard Deviation (SD) and clustering analysis derived from gene expression were matched (fig. 3). This also validates our covariate selection. Importantly, prostate cancer patients presenting with these changes are indeed at risk of recurrence, i.e., these patients are predominantly in the prostate cancer recurrence group (fig. 3, see "disease free state" diagram). Tumors positive for SigMuc1NW were also strongly associated with a decrease in disease-free survival (DFS) (fig. 4A, p ═ 1.12 e-12). The sensitivity of this association was 56.4% and the specificity 72.6%, with a significant increase in sensitivity compared to the initially reported genome characteristic of the 9-gene (sensitivity 34.8%, specificity 83.6%, p ═ 5.57 e-5). Considering that 10 patients died of cancer in the TCGA sub-database, interestingly 8 of these 10 deaths occurred in SigMuc1NW positive patients (fig. 4B, p-0.00212), consistent with the inference that VAV2, ASNS, DNMT3B, and AURKA are promoting castration-resistant prostate cancer (CRPC). As expected, the effect of SigMuc1NW on the characteristic genome for selection of the 9-genes of 696 differentially expressed genes was overlapping (fig. 5). The association of the 9-gene signature genome with biochemical recurrence of prostate cancer was significantly enhanced with the combined use of SigMuc1NW (FIG. 6A, C) and was significantly associated with a decrease in Overall Survival (OS) (FIG. 6B).

3. SigMuc1NW effectively distinguished recurrent prostate cancer from prostate cancer without biochemical recurrence

To test the effectiveness of SigMuc1NW in distinguishing between recurrent prostate cancer and prostate cancer without biochemical recurrence, we assigned 15 gene changes based on Cox efficiency (table 3). The cumulative score of the individual patient SigMuc1NW is then calculated as Σ (fi) n (fi: geei Cox coefficients, n ═ 15). We used time-dependent roc (troc) to assess the sensitivity and specificity of predicting biochemical recurrence using SigMuc1 NW. These scores divided recurrent prostate cancer by tAUC (area under the curve) into 74.9% at 11.5 and 32.1 months, and 69.7% at 48.4 months (fig. 7A). This further reveals that SigMuc1NW is particularly effective for predicting early BCR. We determined the cut point (cutpoint) score of SigMuc1NW in distinguishing recurrent from non-recurrent prostate cancer using the maximum selection level statistics in the Maxstat software package in R (fig. 8) and converted the score into binary code; the score ≦ 1.7833 (split point, FIG. 8) is assigned a "0" and the score >1.7833 is assigned a "1". Biochemical recurrence changes of prostate cancer with scores above the cut-off point were quite evident compared to scores not above the cut-off point (fig. 7B). Demarcation point positive tumors showed biochemical recurrence in even shorter time compared to SigMuc1NW positive prostate cancer (fig. 4A; MMDF 63.2, 95% CI 40-77.3) (fig. 7B; MMDF 33.1, 95% CI 30.9-73.4). The cut points not only contribute to the predictive efficacy of SigMuc1NW at the time of clinical use, but also enhance its predictive power. In addition, mean and quartile 3 (Q3) scores can stratify patients with high risk biochemical relapse with comparable effect to SigMuc1NW (compare fig. 7C, D with fig. 4A). Mean and Q3 scores covered 48 and 46 recurrent prostate cancers, respectively (fig. 7C, D), which exceeded 41 recurrent prostate cancers marked by the cut-points (fig. 7A). Thus, the mean (0.918), Q3(1.019), and the score point (1.7883) can be used in combination to predict biochemical recurrence with a range of biochemical recurrence risks. After adjusting the diagnostic age, post-radical treatment Gleason score, surgical margin cancer residual and TMN tumor grade, we further confirmed that independent risk factors for prostate cancer recurrence included SigMuc1NW (p ═ 1.62e-4), score points (p ═ 2.05e-5), mean (p ═ 1.19e-4) and Q3(p ═ 1.67e-4) (table 4). When the World Health Organization (WHO) prostate cancer rating system is used instead of Gleason rating, SigMuc1NW (p 0.0532) and Q3(p 0.0576) statistically significant p-values approach 0.05 after adjusting for the above clinical factors, while the cut-off point (p 0.00395) and mean (p 0.0187) remain independent risk factors for biochemical recurrence of prostate cancer.

TABLE 4

Wherein, 1: SigMuc1 NW; 2: SigMuc1 NW-derived demarcation points; 3: age at diagnosis; 4: radical prostatectomy Gleason score; 5: invasion of seminal vesicle; 6: a surgical margin; 7: tumor staging (0 for. ltoreq. T2; 1 for T3 and T4); HR: a risk ratio; CI: a confidence interval; NA: is not available.

4. Improving the prediction efficiency of SigMuc1NW

To further demonstrate the effectiveness and robustness of SigMuc1NW, we analyzed the efficacy of the characteristic genome using actual gene expression data rather than using Standard Deviation (SD) based distributions. For this purpose, RNA sequencing data of all 15 SigMuc1NW genes were retrieved from the TCGA sub-database and a cut-off point was estimated to distinguish individual gene expression in recurrent prostate cancer¹(Table 5). As described above, all 15 genes were given binary codes for all tumors, except for the down-regulated genes SLCO2a1 and CGNL1, where tumors expressing less than the cut point were designated as "1". Under the proportional risk assumption, univariate Cox proportional risks (PH) analysis was performed on all genes. Biochemical recurrence of cancer was significantly predicted for all 15 genes defined by their demarcation points (fig. 9). After corrected age, radical post-operative Gleason score, surgical margin and TMN tumor grade, SLC02a1(P ═ 0.0369), SUPV3L1(P ═ 0.000798), TATDN2(P ═ 0.000835), MGAT4B (P ═ 0.0128), VAV2(P ═ 0.0024), SLC25a33(P ═ 0.0297) and OIP5(P ═ 0.00638) remain independent risk factors for biochemical recurrence of cancer (P ═ 0.00102).

TABLE 5

Genes	Cutpoint²	p-value	Coef³	p-value
					SLCO2A1	497.3292	0.09128	0.7967	0.00499**
CGNL1	3066.229	0.004126**	0.7966	0.000372***
					SUPV3L1	545.8928	0.007953**	0.7992	0.000187***
TATDN2	1756.057	0.002471**	0.8731^#	8.48e-5***
					MGAT4B	1818.718	6.389e-5***	1.0331	2.61e-6***
VAV2	1489.06	0.000547***	0.9402	9.94e-6***
					SLC25A33	297.5508	0.2522	0.8503	0.0218*
MCCC1	1233.159	0.001077**	1.0179	1.2e-5***
					ASNS	1041.086	0.01123*	1.0544	0.000109***
CASKIN1	106.4046	0.02646*	0.7006	0.00125**
					DNMT3B	61.4086	0.008576**	0.9082	0.000175***
AURKA	81.1249	3.807e-5***	1.0223	1.12e-6***
					OIP5	16.4317	4.237e-7***	1.242^#	2.64e-8***
CTHRC1	180.8622	0.01389*	0.7608	0.000537***
					GOLGA7B	23.2022	0.01249*	0.7623	0.000581***

Wherein, 1: RNA sequencing data for SigMuc1NW component genes were obtained from a TCGA sub-database (cbioport). 2: the cut point is estimated using the maximum selection level Statistics (Maximally Selected Rank Statistics) in R. 3, determining the coefficient of biochemical recurrence using univariate Cox proportional Risk analysis. #: the PH is assumed to be at p < 0.05.

Using the obtained Cox coefficients (table 5), all demarcation point-positive tumors were transformed to the corresponding coefficient values. Based on the validity determined by the p-value (fig. 9), we further grouped three sub-characteristic genomes SigCut1, SigCut2, and SigCut3 (fig. 9). SigCut1, SigCut2, and SigCut3 scores were then performed on all tumors using Σ (fi) n (fi: Cox coefficients for genii, n ═ 3, 6, and 15). At > 70% of tAUC, all three sub-characteristic genomes effectively distinguished recurrent prostate cancer (FIG. 10A). The demarcation points for each of these genomes: SigCut1 1.0331/p-6.166 e-8, SigCut2 4.0135/p-1.005 e-11, and SigCut3 5.4067/p-7.97 e-15. The corresponding binary code for each subgenome is then assigned to all tumors for survival analysis. All three subgenomes were significantly associated with a reduction in disease-free survival (DFS), with SigCut2 and SigCut3 being more effective (FIGS. 10B-D). Nevertheless, they can be used to predict biochemical recurrence of cancer (BCR), which includes the number of recurrent tumors and the median disease-free period (MMDF). Their sensitivity/specificity: SigCut1 was 71.4%/63.9%, SigCut2 was 41.8%/87.5%, and SigCut3 was 67.7%/75.7% (FIG. 10B-D). Thus, these three subgenomes can be used together to predict recurrent prostate cancer.

Q1(1.647), median (3.589) and Q3(6.386) scores were all effective in stratifying the risk of biochemical recurrence for prostate cancer, with sensitivity/specificity/median month disease free (MMDF/p) values: q1 was 93.4%/31.8%/81.2/6.76 e-6, with a median of 80.2%/56.9%/66.9/6.73 e-11, and Q3 was 56%/82%/40/0 (FIG. 11). When Q1, median, Q3 and SigCut3 cut points were used together, it provided a very effective assessment system that could stratify recurrent and non-recurrent prostate cancer, with only a few recurrent cases having a tumor score less than Q1 (fig. 10E).

Furthermore, SigCut3 is significantly more efficient (fig. 10D) than SigMuc1NW (fig. 4A) constructed using Standard Deviation (SD). After adjustments based on diagnostic age, post-radical treatment Gleason score, surgical margin and TMN tumor stage, SigCut1(p ═ 0.00308), SigCut2(p ═ 1.55e-5) and SigCut3(p ═ 2.97e-6) can independently predict biochemical recurrence of cancer, respectively. All three subgenomes are associated with an adverse feature of prostate cancer: tumors in the progressive stage (T3 and T4) SigCut1 had odds ratios and 95% CI of 1.78/1.51-2.12(p ═ 2.39e-11), SigCut2 of 1.55/1.37-1.77(p ═ 1.33e-11) and SigCut3 of 1.33/1.23-1.44(p ═ 8.47 e-13); the odds ratios/95% CI were 2.19/1.86-2.6(p <2e-16), 1.84/1.62-2.1(p <2e-16) and 1.48/1.37-1.61(p <2e-16) on Gleason scale 8-10. Together, these confirm the effectiveness of SigMuc1 NW.

5. Further verification SigMuc1NW

Of the 13 prostate cancer databases in cbioport, 4 contained mRNA data. Primary prostate cancer was from the TCGA sub-pool and the broadl/Cornell (Nat Genet 2012) database; metastatic prostate cancer is from Fred Hutchinson (Nat Med 2016) and SU2/PCF Dream populations (cBioPortal). In addition, these data sets provide gene expression analysis based on Standard Deviation (SD) distribution. Analyzing the data from these databases, the proportion of positive SigMuc1NW in metastatic prostate cancer was significantly higher than in primary prostate cancer (table 6). Table 6 shows the upregulation of SigMuc1NW in metastatic prostate cancerⁱ。

TABLE 6

Wherein i, the data is obtained from cBioPortal; ii TCGA database; p ═ 0.0002 compared to primary prostate cancer using Fisher's Exact test.

There were 216 prostate Cancer data in the MKSCC (Cancer Cell 2010) database of cbioport, whose mRNA expression acquired DNA chips; after comparing normal prostate tissue to prostate cancer, these data were collated (cbioport). This population contains follow-up information and thus supports survival analysis. To further validate the effectiveness of SigMuc1NW constructed using RNA sequencing data from the TCGA sub-database, we extracted mRNA expression data of all 15 component genes from the MKSCC database along with the corresponding clinical information. Patient samples can be classified as normal prostate (n-29), primary prostate (n-149), recurrent prostate (n-36) and metastatic prostate (n-9) (cBioPortal). We demonstrated that CGNL1 expression was significantly reduced in primary prostate cancer compared to normal prostate tissue; furthermore, metastatic prostate cancer was significantly reduced in expression of SLCO2A1 and CGNL1, two down-regulated genes of SigMuc1NW, compared to local prostate cancer, and non-recurrent prostate cancer compared to recurrent prostate cancer (fig. 12A-C). The above comparison (fig. 12A-C) supports the authenticity of SigMuc1NW, in that SigMuc1NW up-regulates gene expression also at significantly higher levels.

Using the system described above, we also obtained demarcation points for all 15 genes and assigned binary codes. We also used Cox proportional Risk regression (Cox PH) to determine the association of individual genes with biochemical recurrence (Table 7, Table 7 is the cut-off point and Cox coefficient for prostate cancer patients SigMuc1NW in the MSKCC database¹). In addition to MCCC1 being inversely correlated with disease-free survival (DFS) and 4 not significantly correlated with disease-free survival, the remaining 10 genes significantly predicted biochemical recurrence risk while CGNL1 and CTHRC1 more strongly predicted biochemical recurrence risk (table 7). Accordingly, the 10 genes become a sub-characteristic genome, SigMuc1NW 1. As described above, SigMuc1NW1 scores were performed for all tumors using their coefficients (table 7). tROC analysis showed tAUC values from 76.6% to 82.5% (FIG. 13A). SigMuc1NW1 effectively distinguished recurrent from non-recurrent prostate cancer over all follow-up periods of 18.4 and 65 months (fig. 13A); its efficacy was similar to that of SigMuc1NW in identifying recurrent prostate cancer in the TCGA sub-pool (fig. 10A). Furthermore, using binary code scores from SigMuc1NW1 from Q1(0), median (1.805), Q3(3.727) and cut point (6.2136), we performed very efficient hierarchical grouping of recurrent prostate cancer (fig. 13B-E). sensitivity/specificity/PPV (positive predictive value) was 36.1%/98.1%/86.7% of the cut-off point, Q1 was 97.2%/35.6%/34.3%, median 75%/59.6%/39.1%, Q3 was 52.8%/84.6%/54.3% (FIG. 13B-E). The positive predictive value of the cut-off point was very high (86.7%). Taken together, prostate cancer recurrence in patients in the MSKCC database can be effectively predicted by combining Q1, median, Q3 and cut points. We also confirmed the validity of SigMuc1NW using the TCGA sub-database. In a reverse validation work, we demonstrated that SigMuc1NW1 was strongly associated with tumor biochemical recurrence and significantly associated with a reduction in Overall Survival (OS) in the TCGA sub-database (FIG. 14A, B). Taken together, we fully validated the effectiveness of SigMuc1NW and SigMuc1NW1 in predicting prostate cancer recurrence.

TABLE 7

Gene	Demarcation point²	p-value	Coef³	HR	95％CI	p-value
							SLCO2A1	8.155098	0.7073	0.6364	1.89	0.7835-4.558	0.157
CGNL1	10.02132	0.004758**	1.4679	4.34	2.084-9.038	8.8e-5***
							SUPV3L1	7.655546	0.7029	-0.6931	0.5	0.2277-1.098	0.0841
TATDN2	7.755133	0.969	-0.5149	0.5976	0.2476-1.442	0.252
							MGAT4B	8.536576	0.01469*	1.3245	3.76	1.833-7.712	0.000302***
VAV2	7.801308	0.2076	0.8258	2.284	1.184-4.405	0.0138*
							SLC25A33	8.653056	1	0.4752	1.608	0.6248-4.14	0.325
MCCC1	7.789343	0.2982	-1.0768	0.3407	0.1467-0.7911	0.0122*
							ASNS	7.946625	0.01918*	1.1815	3.259	1.567-6.78	0.00157**
CASKIN1	8.142854	0.04935*	1.0985	3	1.529-5.886	0.0014**
							DNMT3B	7.199673	0.06077	1.0373	2.822	1.385-5.749	0.00428**
AURKA	7.215284	0.03781*	1.0552	2.873	1.435-5.75	0.00288**
							OIP5	6.026397	0.05557	0.9789	2.662	1.374-5.156	0.00372**
CTHRC1	7.827664	0.0001814***	1.631	5.109	2.4-10.88	2.33e-5***
							GOLGA7B	7.534541	0.1695	1.1095	3.033	1.371-6.71	0.00617**

Wherein, 1 is DNA chip data of SigMuc1NW component gene obtained from MSKCC database (cBioPortal). 2 demarcation points are estimated using the maximum selection level Statistics (Maximally Selected Rank Statistics) in R. 3, determining the coefficient of biochemical recurrence using univariate Cox proportional Risk analysis.

6. SigMuc1NW and SigMuc1NW1 are associated with disease-free survival (DFS) and decreased Overall Survival (OS) for multiple cancer types

We analyzed the value of SigMuc1NW and SigMuc1NW1 in predicting disease-free survival (DFS) and Overall Survival (OS) for other cancer types. These two markers were significantly associated with a reduction in overall survival in two large breast cancer populations, low grade glioma, head and neck squamous cell carcinoma (SigMuc1NW1 only), clear cell renal cell carcinoma (ccRCC), papillary renal cell carcinoma (pRCC), hepatocellular carcinoma, sarcoma, thyroid carcinoma, and endometrial carcinoma of the uterine body (table 8, table 8 is an association of SigMuc1NW and SigMuc1NW1 with a reduction in overall survival of various cancers^a). These correlations were significant with p values ranging from 10-5(e-5) to 10-7(e-7) (Table 8). These two characteristic genomes may also predict a reduction in disease-free survival (DFS) in low-grade gliomas, clear cell renal cell carcinoma, chromophobe cell carcinoma, hepatocellular carcinoma, and sarcomas (table 9). SigMuc1NW1 is associated with decreased DFS of sarcomas and with SigMuc1NW (table 9, table 9 SigMuc1NW and SigMuc1NW1 are associated with decreased disease-free survival of various cancers^a) In contrast, SigMuc1NW1 was associated with disease recurrence in more cancer types. Collectively, these data confirm the clinical significance of SigMuc1NW and SigMuc1NW 1.

TABLE 8

Where a all cancer datasets are from the cBioPortal database. b, the assigned characteristic genome positive (+) and negative (-) numbers. Total/number of relapses included; MMDF (media weight distribution).

TABLE 9

The present invention develops a new method to analyze polygene-related transcriptomes to obtain a characteristic genome that can be used for diagnosis of tumor recurrence. This was the first time based entirely on polygenic transcriptome analysis (696 genes), rather than on single gene analysis. Due to the novel aspects and the novel comprehensive analysis method, the characteristic genome of the 15-gene is obtained. In this genome, 73.3% (11/15) of the genes have not been reported to be associated with prostate cancer. These 11 novel prostate cancer genes include MGAT4B and OIP 5. The former may play a role in altering the glycosylation of tumor proteins, a very important alteration in tumorigenesis. Aberrant glycosylation of MUC1 has been well documented in tumorigenesis. Thus, MGAT4B in the 15-genome is identical to the genome characteristic of the 9-gene MUC1 from which this genome is derived. OIP5 in SigMuc1NW indicates that dare protein is a Tumor Associated Antigen (TAA) in prostate cancer. Tumor-associated antigens have been extensively studied in cancer diagnosis and therapy. Therefore, OIP5 would have potential clinical applications in prostate cancer diagnosis and treatment.

Due to the complex nature of cancer progression, we chose not to focus on specific aspects of tumorigenesis, but rather to apply the latest machine learning system to the predictive power of assessing biochemical recurrence of 696 genes of prostate cancer. We thus constructed a genome comprising 15 genes. Although the construction of SigMuc1NW is not directed to a particular pathway, the genome may encompass multiple pathways. In addition to the potential effect of MGAT4B on protein glycosylation, the genome also contains proteins with RNA helicase activity (SUPV3L1, table 2) and DNA methyltransferase activity (DNMT3B, table 2). These cellular processes are very important in gene expression and epigenetic changes, and their malignant changes are important manifestations of cancer progression. SigMuc1NW also contains genes that regulate cell proliferation. AURKA is increasingly being considered as an important regulator of mitosis, and is also a key player in tumorigenesis. AURKA is considered to be a very important potential target gene in the field of cancer therapy research. Interestingly, of the 15 genes of SigMuc1NW, only 4 were reported to play a role in prostate cancer, and all 4 genes were able to promote the progression of castration-resistant prostate cancer (CRPC). Since, in castration-resistant prostate cancer, both gene expression and epigenetics are significantly aberrantly altered, the 15 genome can also predict the progression of castration-resistant prostate cancer.

The inclusion of genes that function in multiple pathways may be the main reason for the characteristic genome to be very effective in predicting tumor recurrence. SigMuc1NW and its subgenome are both effective in stratifying prostate cancer into groups based on risk of biochemical recurrence (p ═ 0); and recurrent prostate cancer with tAUC > 75% can also be predicted. By performing the analysis in combination with the subgenome of SigMuc1NW, sensitivity, specificity and PPV (positive predictive value) could be achieved to very reliable levels, e.g., 97.2%, 98.1% and 86.7% in this order (FIGS. 13B-E). Taken together, these evidence strongly suggest that the characteristic genome constructed in the present invention will have very important clinical significance in predicting prostate cancer recurrence.

The method of the invention comprises the following steps:

cBioPortal

the cBioPortal (http:// www.cbioportal.org/index. do) database contains the most sophisticated and comprehensive genetic data for various cancer types. The TCGA sub-database covers genetic abnormalities, transcriptional expression determined by cDNA chip or RNA sequencing, and detailed clinical features including disease outcome (relapse and death). TCGA clinical prostate cancer database there were 492 patients with localized prostate cancer.

Establishing a characteristic genome of multiple genes

The largest TCGA sub-database (n 499) in the cBioPortal database (http:// www.cbioportal.org/index. do), which included 492 patient follow-up data, was used to obtain 696 differentially expressed genes. These 696 genes are closely related to the 9-gene signature gene from the MUC1 genomic network (q < 0.001). Clinical data such as recurrence during follow-up are also extracted. Elastic-net logistic regression in the glmnet software package of R was used to select variables that had significant impact on BCR and 10 cross-validations were performed; the mixing parameters of Elastic-net α were used for: 0.2,0.5 and 0.8. When α is 0, Elastic-net uses Ridge regression analysis, which does not perform covariate selection, but reduces the coefficients of the relevant predictor variables from each other; when α is 1, using Lasso regression analysis, it tends to select one covariate from a set of related covariates; this will make the characteristic genome more efficient. To enhance the selection of highly correlated variables in a population while keeping the number of covariates to a minimum, we define the α value as: 0.2 and 0.8. Using this system, we obtained a genome containing 15 genes.

Patient/tumor corresponding characteristic genomic scores were given

Single component genes were tested for their efficacy in predicting biochemical recurrence using univariate Cox Proportional Hazards (PH) regression; the Cox coefficients for the individual component genes were obtained. The PH hypothesis is also determined. The analysis was performed using the Rsurvival package. Characteristic gene scores for individual patients are given using Sum (coef1+ coef2+. + coefn), where coef1.. coefn is the coefficient for a single gene.

Cut Point (Cutpoint) estimation

The cut points were obtained using maximum selection level statistics (Maxstat software package) analysis in R according to the patient's characteristic genomic score. This cut-off point is used to distinguish between recurrent and non-recurrent prostate cancer. We also obtained RNA expression from the TCGA sub-database as determined by RNA sequencing; we also evaluated the efficacy of the demarcation point for distinguishing between recurrent and non-recurrent prostate cancer.

Regression analysis

Logistic regression was performed using the R language. Cox proportional hazards (Cox PH) regression analysis was performed using the Rsurvivval package. The PH hypothesis was also examined.

Enrichment analysis of pathways

The GAGE and reaction packages in R are used for KEGG (Kyoto Encyclopedia of Genes and genomes) and GO (gene ontology) pathway analysis for differential gene analysis.

Statistical analysis

Fisher's exact assay was performed using GraphPad Prism 5 software. Kaplan-Meier survival analysis and log rank test were performed using the Rsurvivval package and the tool supplied by cBioPortal. Univariate and multivariate Cox regression analyses were performed using the Rsurvival package. Time-dependent recording characterization (tROC) analysis was performed using the R timeROC software package. Values with p <0.05 were considered statistically significant.

In summary, the gene combination for detecting the risk of cancer recurrence provided by the present invention has an advantage of effectively predicting the risk of recurrence of cancer such as prostate cancer.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims

1. A biomarker for detecting the risk of prostate cancer recurrence, comprising at least one set of characteristic genomes from: SigCut1, SigCut2, SigCut3, and SigMuc1NW 1;

the SigCut1 includes the following genes: MGAT4B, AURKA and OIP 5;

2. The biomarker for detecting the risk of prostate cancer recurrence according to claim 1, wherein the genes in the biomarker include isoforms of the genes and family members of the genes.

3. The biomarker for detecting the risk of prostate cancer recurrence according to claim 1, wherein the biomarker is used to assess the risk of death in a prostate cancer patient.

4. The biomarker for detecting the risk of prostate cancer recurrence according to claim 1, wherein the biomarker is used to assess the risk of cancer recurrence at the time of primary diagnosis of prostate cancer.

5. The biomarker for detecting the risk of prostate cancer recurrence according to claim 1, wherein the biomarker is used to assess the risk of recurrence after radical prostate cancer therapy.