[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

TW201339310A - Gene set for predicting post-surgery recurrence or metastasis risk in cancer patients and method thereof - Google Patents

Gene set for predicting post-surgery recurrence or metastasis risk in cancer patients and method thereof Download PDF

Info

Publication number
TW201339310A
TW201339310A TW101110194A TW101110194A TW201339310A TW 201339310 A TW201339310 A TW 201339310A TW 101110194 A TW101110194 A TW 101110194A TW 101110194 A TW101110194 A TW 101110194A TW 201339310 A TW201339310 A TW 201339310A
Authority
TW
Taiwan
Prior art keywords
gene
seq
combination
nucleic acid
acid sequence
Prior art date
Application number
TW101110194A
Other languages
Chinese (zh)
Other versions
TWI450968B (en
Inventor
Ke-Shiuan Lynn
Lung-Kun Chen
Shang-Chi Lin
Original Assignee
Phalanx Biotech Group Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Phalanx Biotech Group Inc filed Critical Phalanx Biotech Group Inc
Priority to TW101110194A priority Critical patent/TWI450968B/en
Publication of TW201339310A publication Critical patent/TW201339310A/en
Application granted granted Critical
Publication of TWI450968B publication Critical patent/TWI450968B/en

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides a gene set for predicting post-surgery recurrence or metastasis risk in cancer patients and a method of using said genes to predict post-surgery recurrence or metastasis risk in cancer patients, especially in liver cancer. The precision rate of predicting post-surgery recurrence in patients with liver cancer according to the present invention is 76.47%; and the precision rate of predicting post-surgery metastasis in patients with liver cancer is 82.61%. Thereby, the best successive treatment method can be evaluated by precisely predicting the recurrence and metastasis risk in patients with liver cancer.

Description

一種預測癌症患者術後復發或轉移風險之基因組合及方法Gene combination and method for predicting risk of postoperative recurrence or metastasis in cancer patients

本發明係為一種預測癌症患者術後復發或轉移風險之基因組合及方法,尤其是一種利用基因-基因交互作用概念預測癌症患者術後復發或轉移風險之基因組合及方法,特別是指肝癌。The invention relates to a gene combination and a method for predicting the risk of postoperative recurrence or metastasis of a cancer patient, in particular to a gene combination and a method for predicting the risk of postoperative recurrence or metastasis of a cancer patient by using the concept of gene-gene interaction, in particular to liver cancer.

肝細胞癌(Hepatocellular carcinoma,HCC),簡稱肝癌,為台灣最常見的癌症之一,同時在全世界各類癌症的發生率和死亡率統計上,肝癌也始終名列前茅。根據統計,每年全世界約有100萬人罹患肝癌,50至100萬人因肝癌而死亡,以發生地區而言,肝癌較好發於開發中國家,主要集中在東亞和非洲,好發比例男性較女性多,約為四比一。Hepatocellular carcinoma (HCC), referred to as liver cancer, is one of the most common cancers in Taiwan. At the same time, liver cancer is also among the best in the world in terms of the incidence and mortality of various types of cancer. According to statistics, about 1 million people worldwide suffer from liver cancer every year, and 5 to 1 million people die from liver cancer. In terms of the region, liver cancer is better developed in developing countries, mainly in East Asia and Africa. More than women, about four to one.

根據衛生署國民健康局統計,台灣地區肝癌的發生率為每10萬人有74.3人罹患肝癌,肝癌更是所有因癌症死亡的首要病因,每年約有3700位患者因罹患肝癌而往生,因此了解肝癌的現況、成因和危險因子,做為早期診斷和追蹤是非常重要的。According to the National Health Service of the Department of Health, the incidence of liver cancer in Taiwan is 74.3 per 100,000 people suffering from liver cancer. Liver cancer is the leading cause of all cancer deaths. About 3,700 patients die each year because of liver cancer. The status, causes and risk factors of liver cancer are very important for early diagnosis and tracking.

造成肝癌的原因主要分為化學致癌物質之暴露及感染肝炎二大類。化學致癌物質包括男性荷爾蒙、女性荷爾蒙、酒精、一般環境污染物如四氯化碳、DDT、戴奧辛等等,另外特別一提的是儲藏的穀類因被黃麴黴菌污染而產生的黃麴毒素,經長期服用後,亦是造成肝癌的重要原因之一。而根據流行病學和動物實驗的研究,已確定慢性B型肝炎和C型肝炎病毒的感染是形成肝癌最重要的危險因子,畢斯理發現,B型肝炎陽性帶原者,罹患肝癌的機會比陰性者高98倍,同時學者發現在台灣地區一般群眾之肝癌發生率為每10萬有10-30人罹患肝癌,但是男性B型肝炎患者則為每10萬人口有200-812例,若病患不幸發展至肝硬化的階段,則全臺灣之年發生率將增加至1000-5000例,因此如何避免和減少肝炎病毒的感染和治療肝炎是一個很重要的公共衛生議題。The causes of liver cancer are mainly divided into two categories: exposure to chemical carcinogens and infection with hepatitis. Chemical carcinogens include male hormones, female hormones, alcohol, general environmental pollutants such as carbon tetrachloride, DDT, dioxin, etc., and in particular, the storage of cereals caused by xanthine fungus caused by xanthine toxin, After long-term use, it is also one of the important causes of liver cancer. According to epidemiological and animal experiments, it has been determined that chronic hepatitis B and hepatitis C virus infection are the most important risk factors for the formation of liver cancer. Bisley found that hepatitis B positive carriers have the opportunity to develop liver cancer. It is 98 times higher than the negative one. At the same time, scholars have found that the incidence of liver cancer in the general population in Taiwan is 10-30 people per 100,000 people, but male hepatitis B patients have 200-812 cases per 100,000 population. The unfortunate development of patients to the stage of cirrhosis, the annual incidence in Taiwan will increase to 1000-5000 cases, so how to avoid and reduce hepatitis virus infection and treatment of hepatitis is an important public health issue.

肝癌最普遍的治療方法包括手術切除、經導管肝動脈栓塞治療(trans catheter hepatic arterial embolization)、經皮穿刺腫瘤內酒精注射(percutaneous ethanol intratumor injection)、冷凍及移植。在這些療法當中,只有手術切除與器官移植法最為有效。然而,大多數病人在疾病發現時因為肝功能不佳(75%以上的病人有潛在的慢性肝病)、兩側肝葉疾病、或肝外轉移而無法切除。一位肝癌患者是否適合做手術切除取決於其肝癌的分期與肝臟的功能,而只有大約15%的肝癌患者適合做手術切除。如果肝癌無法切除,預後很差,平均存活期只有幾個月。即使是接受手術切除的患者,5年復發率高達63%。一些癌症如肝癌、肺癌、食管癌和胃癌的5年存活率均不超過30%,而其中肝癌的5年存活率與其他癌症相較之下更低,僅5%~6%。The most common treatments for liver cancer include surgical resection, transcatheter arterial embolization, percutaneous ethanol intratumor injection, cryopreservation, and transplantation. Among these therapies, only surgical resection and organ transplantation are the most effective. However, most patients cannot be removed at the time of disease discovery because of poor liver function (more than 75% of patients with potentially chronic liver disease), bilateral liver disease, or extrahepatic metastases. Whether a liver cancer patient is suitable for surgical resection depends on the stage of liver cancer and liver function, and only about 15% of liver cancer patients are suitable for surgical resection. If liver cancer cannot be removed, the prognosis is poor and the average survival is only a few months. Even in patients undergoing surgical resection, the 5-year recurrence rate is as high as 63%. The 5-year survival rates of some cancers such as liver cancer, lung cancer, esophageal cancer and gastric cancer are no more than 30%, and the 5-year survival rate of liver cancer is lower than that of other cancers, only 5% to 6%.

由於肝癌手術切除後的五年內復發率非常高(~63%),當手術切除治療後病患又復發,醫師會再考慮採取換肝或投藥治療。但是如果病患回診頻率不高,則可能發現復發時已經無法進行換肝治療了。雖然目前臨床上可憑藉一些腫瘤特徵來判斷兩年內短期復發的機率,但並無其他方法可判斷長期復發的機率,因此如能有可判斷肝癌長期復發機率的預後產品,可以告知病患4~5年內復發機率有多高,一方面可提醒病患在這期間一定要定期回診,另一方面也可及早安排接受換肝或投藥治療。Because the recurrence rate is very high (~63%) within five years after surgical resection of liver cancer, the patient will relapse after surgery and the doctor will consider taking liver or medication. However, if the frequency of patient visits is not high, it may be found that hepatic therapy cannot be performed at the time of recurrence. Although there are some clinical features that can be used to judge the probability of short-term recurrence within two years, there is no other way to judge the probability of long-term recurrence. Therefore, if there is a prognostic product that can judge the long-term recurrence rate of liver cancer, you can inform the patient 4 How high is the recurrence rate in ~5 years, on the one hand, it can remind patients to check back regularly during this period, on the other hand, they can arrange for liver replacement or medication treatment as soon as possible.

目前大多數的癌症手術後治療方式通常是以醫師經驗判斷病患是否需進一步接受積極治療,但這種方式常因個人經驗值不同而造成判斷準確度不足。但癌症常為複雜的致病原因所引起,以臨床上常用的生化指數(如AFP或CA-125)或儀器檢查(如電腦斷層、超音波、或磁振造影檢查)來追蹤病患術後是否復發並不夠積極,除了這些生化數值之正確率令人質疑之外,通常也代表癌症腫瘤已長大到足以被發現的程度,甚至是已擴散時才被發現。目前在國際臨床研究中已發現基因圖譜可用於一些疾病的預測,利用基因圖譜預測的好處為即使不清楚該疾病的作用機制,亦能用於預測疾病的發生機率。因此,本發明係利用特定基因表現圖譜分析預測肝癌術後復發或轉移的機率,可使病患能及早接受治療並延長其存活時間。At present, most post-surgical treatments for cancer are usually based on the experience of physicians to determine whether patients need further active treatment, but this method often results in insufficient judgment accuracy due to different personal experience values. However, cancer is often caused by complex pathogenic causes, and clinically used biochemical indexes (such as AFP or CA-125) or instrumental examinations (such as computed tomography, ultrasound, or magnetic resonance imaging) to track patients after surgery. Whether recurrence is not positive enough, in addition to questioning the correct rate of these biochemical values, usually also represents the extent to which cancer tumors have grown enough to be discovered, even when they have spread. At present, it has been found in international clinical research that gene maps can be used for the prediction of some diseases. The advantage of using gene map prediction is that it can be used to predict the incidence of diseases even if the mechanism of action of the disease is unclear. Therefore, the present invention utilizes specific gene expression profiling to predict the probability of recurrence or metastasis of liver cancer, so that patients can receive treatment early and prolong their survival time.

為準確預測肝癌復發以及轉移的風險以解決上述之問題,本發明之一目的,係提供一種預測肝癌患者術後復發風險之基因組合,係為SEQ ID NO:1~33所示之核酸序列中至少二個基因之任一組合。In order to accurately predict the risk of liver cancer recurrence and metastasis to solve the above problems, it is an object of the present invention to provide a gene combination for predicting the risk of postoperative recurrence of liver cancer patients, which is the nucleic acid sequence shown in SEQ ID NOS: 1 to 33. Any combination of at least two genes.

在本發明之一實施例中,上述之基因組合係為SEQ ID NO:1~6,8~10所示之核酸序列。In one embodiment of the present invention, the above gene combination is the nucleic acid sequence shown in SEQ ID NOS: 1 to 6, 8 to 10.

利用上述基因組合可由下列步驟預測肝癌患者術後復發風險:Using the above combination of genes, the following steps can be used to predict the risk of postoperative recurrence in liver cancer patients:

(1) 取得一肝癌患者之一腫瘤組織及一周圍正常組織;(1) Obtaining a tumor tissue and a surrounding normal tissue of a liver cancer patient;

(2) 分析該腫瘤組織及該周圍正常組織之一第一基因組合及一第二基因組合,其中,該第一基因組合係由SEQ ID NO:1~3所示之核酸序列所組成,以及該第二基因組合係為SEQ ID NO:4~6所示之核酸序列所構成之基因組合II(a),或SEQ ID NO:4~5,7所示之核酸序列所構成之基因組合II(b),將該些基因組合中之基因於腫瘤組織及周圍正常組織的基因表現強度相除,得到該些基因組合內各基因的表現比率,其中SEQ ID NO:1及4所示之核酸序列為共享基因,與該基因組合內其餘基因計算得一第一基因組合交互作用值及一第二基因組合交互作用值;以及(2) analyzing a first gene combination and a second gene combination of the tumor tissue and the surrounding normal tissue, wherein the first gene combination is composed of the nucleic acid sequences shown in SEQ ID NOS: 1 to 3, and The second gene combination is a gene combination II (a) composed of the nucleic acid sequences shown in SEQ ID NOS: 4 to 6, or a gene combination composed of the nucleic acid sequences shown in SEQ ID NOS: 4 to 5, 7. (b) dividing the gene expression in the gene combination into the tumor tissue and the surrounding normal tissue to obtain the expression ratio of each gene in the combination of the genes, wherein the nucleic acids shown in SEQ ID NOS: 1 and 4 The sequence is a shared gene, and a first gene combination interaction value and a second gene combination interaction value are calculated from the remaining genes in the gene combination;

(3)當該第一基因組合交互作用值低於一第一門檻值且該第二基因組合交互作用值高於或等於一第二門檻值時,判定為高復發風險之患者,其中,該第一基因組合交互作用值為SEQ ID NO:1與SEQ ID NO:2所示之核酸序列之基因表現比率相乘再加上SEQ ID NO:1與SEQ ID NO:3所示之核酸序列之基因表現比率相乘結果,該第二基因組合II(a)的交互作用值為SEQ ID NO:4與SEQ ID NO:5所示之核酸序列之基因表現比率相乘再加上SEQ ID NO:4與SEQ ID NO:6所示之核酸序列之基因表現比率相乘結果,該第二基因組合II(b)的交互作用值為SEQ ID NO:4與SEQ ID NO:5所示之核酸序列之基因表現比率相乘再加上SEQ ID NO:4與SEQ ID NO:7所示之核酸序列之基因表現比率相乘結果。(3) when the first gene combination interaction value is lower than a first threshold value and the second gene combination interaction value is higher than or equal to a second threshold value, the patient is determined to have a high risk of recurrence, wherein The first gene combination interaction value is multiplied by the gene expression ratio of the nucleic acid sequence shown in SEQ ID NO: 1 and SEQ ID NO: 2, and the nucleic acid sequences shown in SEQ ID NO: 1 and SEQ ID NO: As a result of multiplication of the gene expression ratio, the interaction value of the second gene combination II(a) is multiplied by the gene expression ratio of the nucleic acid sequence shown in SEQ ID NO: 4 and SEQ ID NO: 5 plus SEQ ID NO: 4 multiplied by the gene expression ratio of the nucleic acid sequence shown in SEQ ID NO: 6, the interaction value of the second gene combination II (b) is the nucleic acid sequence shown in SEQ ID NO: 4 and SEQ ID NO: 5. The gene expression ratio is multiplied together with the gene expression ratio of the nucleic acid sequence shown in SEQ ID NO: 4 and SEQ ID NO: 7.

本發明之另一目的,係提供一種預測肝癌患者術後轉移風險之基因組合,係為SEQ ID NO:28,34~42所示之核酸序列中至少二個基因之任一組合。Another object of the present invention is to provide a combination of genes for predicting the risk of postoperative metastasis in a liver cancer patient, which is any combination of at least two of the nucleic acid sequences shown in SEQ ID NO: 28, 34-42.

在本發明一實施例中,前述之基因組合係為SEQ ID NO:28,34~38所示之核酸序列。In one embodiment of the present invention, the aforementioned gene combination is the nucleic acid sequence shown in SEQ ID NO: 28, 34 to 38.

利用上述基因組合可由下列步驟預測肝癌患者術後轉移風險:Using the above combination of genes, the following steps can be used to predict the postoperative risk of liver cancer patients:

(1) 取得一肝癌患者之一腫瘤組織及一周圍正常組織;(1) Obtaining a tumor tissue and a surrounding normal tissue of a liver cancer patient;

(2) 分析該腫瘤組織及該周圍正常組織之一第五基因組合係由SEQ ID NO:28,34~35所示之核酸序列所組成,將該第五基因組合中之基因於腫瘤組織及周圍正常組織的基因表現強度相除,得到各基因的基因表現比率,其中SEQ ID NO:34所示之核酸序列為共享基因,與SEQ ID NO:35及28所示之核酸序列計算得一第五基因組合交互作用值;以及(2) analyzing the tumor tissue and one of the surrounding normal tissues, the fifth gene combination is composed of the nucleic acid sequences shown in SEQ ID NO: 28, 34 to 35, and the gene in the fifth gene combination is in the tumor tissue and The gene expression intensity of the surrounding normal tissues is divided, and the gene expression ratio of each gene is obtained, wherein the nucleic acid sequence shown by SEQ ID NO: 34 is a shared gene, and the nucleic acid sequence shown in SEQ ID NOS: 35 and 28 is calculated. Five gene combination interaction values;

(3) 當該第五基因組合交互作用值低於一第五門檻值,判定為高轉移風險之患者;其中,該第五基因組合交互作用值為SEQ ID NO:34與SEQ ID NO:35所示之核酸序列之基因表現比率相乘再加上SEQ ID NO:34與SEQ ID NO:28所示之核酸序列之基因表現比率相乘結果。(3) When the fifth gene combination interaction value is lower than a fifth threshold value, the patient is determined to have a high risk of metastasis; wherein the fifth gene combination interaction value is SEQ ID NO: 34 and SEQ ID NO: 35 The gene expression ratios of the nucleic acid sequences shown are multiplied together with the gene expression ratio of the nucleic acid sequences shown in SEQ ID NO: 34 and SEQ ID NO: 28.

本發明之又一目的,係提供一種獲得預測癌症患者術後預後之基因的方法,係包含:It is still another object of the present invention to provide a method for obtaining a gene predicting a prognosis of a cancer patient, comprising:

(1) 取得複數個癌症患者之一摘取組織樣本;以及(1) taking one of a plurality of cancer patients to take a tissue sample;

(2) 對於該摘取組織樣本的每一基因分別得到一基因表現數值(gene expression value);(2) obtaining a gene expression value for each gene of the extracted tissue sample;

(3) 依一預後類型,選擇基因-基因交互作用強的複數個基因對,並依複數個共享基因將該些基因分類為複數個基因簇(gene clusters),其中該些共享基因係為該些基因對間相同的基因;以及(3) selecting a plurality of gene pairs with strong gene-gene interaction according to a type of prognosis, and classifying the genes into a plurality of gene clusters according to a plurality of shared genes, wherein the shared gene systems are Genes that are identical between pairs; and

(4) 在各該基因簇中,依基因交互作用之顯著程度依序累加至少一組基因對之基因表現數值相乘值,再依該預後類型,利用決策樹模型(decision tree model)產生預測該預後類型之基因。(4) In each of the gene clusters, at least a group of genes is numerically multiplied according to the degree of gene interaction, and then a decision tree model is used to generate a prediction according to the prognostic type. The prognostic type of gene.

在本發明之一實施例中,上述基因組合之基因表現數值係根據自肝癌病患腫瘤與周邊組織之各基因所轉錄之mRNA量或自該基因所轉譯之多肽量測定之表現比率。In one embodiment of the present invention, the gene expression value of the above gene combination is a ratio of the expression of the mRNA transcribed from each gene of the tumor of the liver cancer patient and the surrounding tissue or the amount of the polypeptide translated from the gene.

在本發明另一實施例中,該預後類型為復發或轉移;且步驟(3)之該基因-基因交互作用強的基因對係為滿足式(1)的基因:In another embodiment of the present invention, the prognosis type is recurrence or metastasis; and the gene pair with strong gene-gene interaction in step (3) is a gene satisfying formula (1):

-log10(p(v A×v B)) C th -log10(p(v A))-log10(p(v B)) (1)。-log 10 ( p ( v A × v B )) C th -log 10 ( p ( v A ))-log 10 ( p ( v B )) (1).

經由本發明之基因組合及預測肝癌患者術後預後之方法,在預測肝癌患者術後復發之準確率為76.47%,預測肝癌患者術後轉移之準確率為82.61%,此結果顯示,經由本發明之基因組合及預測肝癌患者術後預後之方法,可提供一預測肝癌患者術後復發或轉移風險之有效方法。Through the gene combination of the present invention and the method for predicting the prognosis of liver cancer patients, the accuracy rate of predicting postoperative recurrence of liver cancer patients is 76.47%, and the accuracy of predicting postoperative metastasis of liver cancer patients is 82.61%, which shows that the present invention is The combination of genes and methods for predicting postoperative prognosis in patients with liver cancer can provide an effective method for predicting the risk of postoperative recurrence or metastasis in patients with liver cancer.

經由本發明說明書之揭露,任何熟習此技藝者,可利用習知技術製成本發明所述的基因組合之生物晶片或微陣列分析,並使用該生物晶片或微陣列分析之試驗結果預測肝癌患者術後復發或轉移風險。Through the disclosure of the present specification, any skilled person skilled in the art can use the prior art to make a biochip or microarray analysis of the gene combination of the present invention, and predict the liver cancer patient using the test results of the biochip or microarray analysis. Risk of recurrence or metastasis.

以下將配合圖式進一步說明本發明的實施方式,本發明內容之說明以及下述所列舉的實施例係用以闡明本發明,並提供本發明之專利申請範圍更進一步之解釋。The embodiments of the present invention are further described in the following description, and the description of the present invention and the following examples are intended to illustrate the invention and provide further explanation of the scope of the patent application of the present invention.

定義definition

本發明提及之「共享基因」,係為一基因簇(gene clusters)中所有基因對間相同的基因。The "shared gene" referred to in the present invention is the same gene among all gene pairs in a gene cluster.

本發明提及之「配對基因」,係為與共享基因組成基因對之基因;在基因組合中,「配對基因」係為基因組合內除共享基因外之其餘基因。The "pairing gene" referred to in the present invention is a gene paired with a gene-sharing gene; in the gene combination, the "pairing gene" is the remaining gene in the gene combination except the shared gene.

本發明提及之「交互作用值」,係為共享基因之基因表現比率分別與第1至n個配對基因之基因表現比率相乘的總和,其中n為所採計計算的配對基因總數。The "interaction value" referred to in the present invention is the sum of the gene expression ratios of the shared genes and the gene expression ratios of the first to n paired genes, respectively, where n is the total number of paired genes calculated.

本發明之一目的,提供一彼此間交互作用的基因組合,包含任何列於表一與表三所列的所有基因,其中基因之序列號分別列於表二與表四。It is an object of the present invention to provide a combination of genes that interact with each other, including any of the genes listed in Tables I and III, wherein the sequence numbers of the genes are listed in Tables 2 and 4, respectively.

本發明也包含任何列於表一中的同一列的基因次組合(subset),係用於預測肝癌患者在7.3年內復發或未復發的狀況。較佳的9個基因形成之次組合顯示於表五中。The present invention also encompasses any of the sub-columns of the same column listed in Table 1, for predicting whether a liver cancer patient has relapsed or not relapsed within 7.3 years. A preferred sub-combination of the nine gene formations is shown in Table 5.

本發明之又一目的,係提供一包含11個基因之基因組合(set)如表三,用於預測肝癌患者在7.3年內轉移或未轉移的狀況。A further object of the present invention is to provide a genetic composition comprising 11 genes as shown in Table 3 for predicting a condition in which a liver cancer patient has metastasized or not metastasized within 7.3 years.

本發明也包含任何列於表三中的同一列的基因次組合(subset),係用於預測肝癌患者在7.3年內轉移或未轉移的狀況。較佳的6個基因形成之次組合顯示於表六中。The present invention also encompasses any of the sub-columns of the same column listed in Table 3 for predicting whether a liver cancer patient has metastasized or not metastasized within 7.3 years. A preferred combination of the six gene formations is shown in Table 6.

本發明中,我們提出一檢測基因-基因交互作用(gene-gene interaction)之新穎模型。以下實施例基於共同基因(shared genes)與發展一演算法將基因歸入基因簇(gene clusters)中,並以一決策樹(decision tree)處理該些基因簇,找到區分肝癌復發與未復發、轉移或未轉移的基因、以及前述基因彼此間的關聯。In the present invention, we propose a novel model for detecting gene-gene interaction. The following examples classify genes into gene clusters based on shared genes and developmental algorithms, and process the gene clusters with a decision tree to find a distinction between recurrence and non-recurrence of liver cancer. Transferred or untransferred genes, and the association of the aforementioned genes with each other.

在本發明實施例中,「有復發」定義為手術後12個月以上至最後追蹤日期內於肝臟內再次發生肝癌,若追蹤期間未再次發生肝癌,則為「未復發」;「有轉移」定義為肝癌細胞於整個追蹤期間內曾散播至肝臟以外的組織,若此期間肝癌細胞未散播,則為「未轉移」。另外,前述的追蹤期間係指病人術後至最後追蹤日期的時間,本發明實施例中使用之病患的追蹤期間中間值(medium)為7.3年。In the embodiment of the present invention, "recurrence" is defined as recurrence of liver cancer in the liver from 12 months to the last tracking date after surgery, and if there is no recurrence of liver cancer during the follow-up period, it is "no recurrence"; It is defined as a liver cancer cell that has spread to tissues other than the liver during the entire tracking period. If the liver cancer cells are not spread during this period, it is "not transferred". In addition, the aforementioned tracking period refers to the time from the postoperative period to the last tracking date of the patient, and the period of the tracking period of the patient used in the embodiment of the present invention is 7.3 years.

實施例1:Example 1: 篩選基因的方法Method of screening genes 1-1 患者條件與樣本收集1-1 Patient conditions and sample collection

本研究受台灣大學醫院人體試驗委員會認可,且已取得所有受試者之受試者同意書。將肝細胞癌(以下簡稱肝癌)患者分為訓練組(training set)及測試組(test set)兩個組別。此二組皆經病理分析證實罹患肝癌,所有存活的患者追蹤至少四年。每一患者皆取得腫瘤及周圍正常組織之檢體,組織檢體係由手術切除癌組織或經由粗針切片取得。訓練組有45位肝癌患者,是平均年齡為52.4±12.5歲的男性,皆感染B型肝炎(HBV)且肝臟有輕度纖維化的現象(METAVIR分級為F1-F2),其中以肝癌復發而言,有21位肝癌患者復發,17位患者未復發;以肝癌轉移而言,有12位患者遠端轉移,33位患者未轉移,其中有5位患者復發與轉移皆有。另一方面,測試組有24位肝癌患者,2位既不是HBV也不是HCV,12位為HBV,12為HCV,其中有2位為HBV與HCV皆是。其中,有18位男性,平均年齡為60.8±8.2歲,6位女性平均年齡為66.8±10.5歲。對於該測試組之24位肝癌患者,其中以肝癌復發而言,有12位患者為肝癌復發,5位患者未復發;以肝癌轉移而言,有5位患者肝癌轉移,18位患者未轉移。The study was approved by the Human Body Testing Committee of the Taiwan University Hospital and the subject consents for all subjects have been obtained. Patients with hepatocellular carcinoma (hereinafter referred to as liver cancer) are divided into two groups: a training set and a test set. Both groups were confirmed by pathological analysis of liver cancer, and all surviving patients were followed for at least four years. Each patient obtained a specimen of the tumor and surrounding normal tissue, and the tissue examination system was obtained by surgically removing the cancer tissue or by thick needle sectioning. There were 45 patients with liver cancer in the training group. The average age was 52.4±12.5 years old. All of them were infected with hepatitis B (HBV) and mild fibrosis in the liver (METAVIR grade was F1-F2), in which liver cancer recurred. There were 21 patients with liver cancer who relapsed, and 17 patients did not relapse. In terms of liver cancer metastasis, 12 patients had distant metastases and 33 patients did not metastasize. Among them, 5 patients had recurrence and metastasis. On the other hand, there were 24 patients with liver cancer in the test group, 2 were neither HBV nor HCV, 12 were HBV, 12 were HCV, and 2 of them were HBV and HCV. Among them, there were 18 males with an average age of 60.8±8.2 years, and the average age of 6 females was 66.8±10.5 years old. For the 24 patients with liver cancer in this test group, 12 patients had recurrence of liver cancer, and 5 patients had no recurrence. In terms of liver cancer metastasis, 5 patients had liver cancer metastasis and 18 patients did not metastasize.

在本發明實施例中,使用訓練組中患者的數據作本發明演算法之訓練,以取得可能預測肝癌預後之基因,再以測試組中的患者數據驗證。In the embodiment of the present invention, the data of the patient in the training group is used as the training of the algorithm of the present invention to obtain a gene that may predict the prognosis of liver cancer, and then verified by the patient data in the test group.

1-2 RNA分離以及微陣列1-2 RNA isolation and microarray

肝臟冷凍組織切片以TRIzol試劑(Invitrogen,Cat. no.: 15596026,Life Technologies)均質化(homogenized)。以After Tri-reagent RNA clean-up kit(Favorgen,Cat. no.: FAATR001)依廠商之標準流程萃取出總RNA。RNA量以分光光度計(spectrophotometer)量測,RNA大小以Agilent 2100 Bioanalyzer上的RNA Nano LabChips量測。每一樣本皆產生一RNA完整度值(RNA Integrity Number,RIN)。Liver frozen tissue sections were homogenized with TRIzol reagent (Invitrogen, Cat. no.: 15596026, Life Technologies). Total RNA was extracted using the After Tri-reagent RNA clean-up kit (Favorgen, Cat. no.: FAATR001) according to the manufacturer's standard protocol. The amount of RNA was measured by spectrophotometer and the RNA size was measured by RNA Nano LabChips on an Agilent 2100 Bioanalyzer. Each sample produced an RNA Integrity Number (RIN).

以Amino Allyl MessageAmpTM II aRNA amplification Kit(Ambion,Austin,TX)放大RNA,此放大方法是以T7啟動子為基礎[Van Gelder RN,von Zastrow ME,Yool A,Dement WC,Barchas JD,Eberwine JH. Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci U S A. 1990 Mar;87(5):1663-7]。本實驗使用兩種版本之Human (HOAv4 & HOAv5),分別使用不同螢光染色法。訓練組內包含HOAv4(45患者×2組織×3次雜交試驗=270晶片結果)與HOAv5(35患者×2組織×1次雜交試驗=70晶片結果)之數據結果,HOAv5之病患群皆包含於HOAv4之病患群內,且HOAv5之數據僅用於挑選穩定基因,而未用於基因表現數值之計算與決策樹模型之建立。測試組僅使用HOAv4(24患者×2組織×3次雜交試驗=144晶片結果)。HOAv4實驗使用20 μg之Cy5標定的aRNA,依照Fong[Fong S et al.,Molecular mechanisms underlying selective cytotoxic activity of BZL101,and extract of Scutellaria barbata towards breast cancer cells. Cancer Biol Ther. 2008 Apr:7(4):577-86]之方式進行aRNA片段化(fragmented)與晶片雜交。HOAv5實驗使用TotalPrep RNA放大套組(Ambion)市售產品,將10 μg生物素標定(biotinylated)之aRNA進行片段化與晶片雜交,之後以Cy3-Streptavidin(PA43001,GE Healthcare)進行後續染色。最後,以DNA微陣列掃描儀(Agilent Technologies,Santa Clara,US)掃描陣列,並以GenePix Pro 4.0(Molecular Devices,Sunnyvale,CA)擷取螢光訊號強度。In Amino Allyl MessageAmp TM II aRNA amplification Kit (Ambion, Austin, TX) amplified RNA, this amplification method is based on the T7 promoter [Van Gelder RN, von Zastrow ME , Yool A, Dement WC, Barchas JD, Eberwine JH. Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci US A. 1990 Mar; 87(5): 1663-7]. This experiment uses two versions of Human (HOAv4 & HOAv5), using different fluorescent staining methods. The training group included HOAv4 (45 patients × 2 tissues × 3 hybridization tests = 270 wafer results) and HOAv5 (35 patients × 2 tissues × 1 hybridization test = 70 wafer results) data results, HOAv5 patients included Within the HOAv4 patient population, and the HOAv5 data was only used to select stable genes, but not for the calculation of gene expression values and the establishment of a decision tree model. The test group used only HOAv4 (24 patients x 2 tissues x 3 hybridization tests = 144 wafer results). The HOAv4 assay uses 20 μg of Cy5-calibrated aRNA according to Fong [Fong S et al., Molecular mechanisms underlying selective cytotoxic activity of BZL101, and extract of Scutellaria barbata towards breast cancer cells. Cancer Biol Ther. 2008 Apr: 7(4) :577-86] ARNA fragmentation is performed on the wafer. The HOAv5 assay used a TotalPrep RNA amplification kit (Ambion) commercially available product, and 10 μg of biotinylated aRNA was fragmented and hybridized to the wafer, followed by subsequent staining with Cy3-Streptavidin (PA43001, GE Healthcare). Finally, the array was scanned with a DNA microarray scanner (Agilent Technologies, Santa Clara, US) and used as GenePix Flu 4.0 signal intensity was captured by Pro 4.0 (Molecular Devices, Sunnyvale, CA).

1-3 數據收集與處理1-3 Data collection and processing

對於HOAv4資料陣列中的每一基因,其基因表現強度(expression intensity)值由波長635nm前景中位強度(foreground median intensity)與由波長635nm之背景中位強度(background median intensity)之差值計算而得,而HOAv5資料則以波長532nm測量計算。對於每一組織樣本,在重複試驗間的相關係數低於0.9之樣本去除,其餘的重複試驗樣本以中位數法合併其基因表現強度值。以一基因在腫瘤組織與該周圍正常組織之基因表現強度相除,得到基因表現比率,再以z-分數正規化(Z-score normalization)之方法對基因表現比率的平均值與標準差進行樣品間的正規化。最後,我們挑選出6,505個基因,其基因表現強度至少為500,且在HOAv4晶片與HOAv5晶片結果相關度至少為0.4。根據Tsai et al.(2005)[Tsai C-A et al.(2005). Sample size for gene expression microarray experiments. Bioinformatics,21(8): 1502-8]所建議的樣本數,我們的訓練組(每組樣本數12且基因數=6,505)之正確率已可達到大於0.99,或是在偽陽性之期望值為1,且平均差(標準化之效應值,standardized effect size)之期望值為2的狀況下,族檢定力(family-wise power)90%時之特異性為0.90。此外,利用晶片的技術重複試驗以及基因表現比率或許可進一步改善上述數值。For each gene in the HOAv4 data array, the expression intensity value is calculated from the difference between the foreground median intensity at 635 nm and the background median intensity at 635 nm. Yes, and the HOAv5 data is calculated at a wavelength of 532 nm. For each tissue sample, the sample with a correlation coefficient below 0.9 was removed from the replicates, and the remaining replicate samples were combined with the gene expression intensity values by the median method. The gene expression ratio is obtained by dividing the gene expression intensity of the tumor tissue with the surrounding normal tissue, and then the sample and the standard deviation of the gene expression ratio are sampled by z-score normalization. Regularization between the two. Finally, we selected 6,505 genes with a gene expression intensity of at least 500 and a correlation of at least 0.4 on HOAv4 wafers with HOAv5 wafers. According to the number of samples recommended by Tsai et al. (2005) [Tsai CA et al. (2005). Sample size for gene expression microarray experiments. Bioinformatics, 21(8): 1502-8], our training group (each group) Number of samples The correct rate of 12 and the number of genes = 6,505) can reach more than 0.99, or the expected value of the false positive is 1, and the expected value of the averaged effect (standardized effect size) is 2, the family test force The specificity of (family-wise power) at 90% was 0.90. In addition, the above-mentioned values are further improved by repeating the test using the technique of the wafer and the gene expression ratio or permitting.

實施例2:Example 2: 2-1 基因-基因交互作用的檢測以及基因簇的建構2-1 Detection of gene-gene interaction and construction of gene clusters

最近的研究指出基因-基因交互作用可能在多種複雜的疾病上比單一基因扮演更重要的角色。我們提出經由確認二基因於微陣列中基因表現數值之乘積,在案例(復發或轉移)及控制組(非復發或非轉移)之間比個別基因顯現高度顯著差異以檢測基因-基因交互作用,而各基因表現數值可為單一組織之單一基因表現強度(gene expression intensity),或是一腫瘤組織與該周圍正常組織之該基因表現強度相除之基因表現比率(gene expression ratio)。本發明一實施例中,將基因A及B的基因表現比率分別設為v Av B。上述想法可由下式呈現:Recent research indicates that gene-gene interactions may play a more important role than a single gene in a variety of complex diseases. We propose to detect gene-gene interactions by identifying the product of the gene expression values of the two genes in the microarray between the case (relapse or metastasis) and the control group (non-recurring or non-metastatic) than the individual genes. The gene expression value may be a single gene expression intensity of a single tissue, or a gene expression ratio in which the tumor tissue is divided by the intensity of the gene expression of the surrounding normal tissue. In one embodiment of the present invention, the gene expression ratios of genes A and B are set to v A and v B , respectively . The above ideas can be presented by:

-log10(p(v A×v B)) C th -log10(p(v A))-log10(p(v B)) (1)-log 10 ( p ( v A × v B )) C th -log 10 ( p ( v A ))-log 10 ( p ( v B )) (1)

p(v)為基因表現比率在案例及控制組間的Student’s T test值,C th 為一正常數。發現符合式(1)基因對在案例組及控制組間有高度的相關性,而前處理步驟中所使用的z分數正規化方法更可讓我們分辨出二交互作用的基因為高度正相關(協同作用)或高度負相關(拮抗作用)。此外,許多的交互作用基因對共享一共同基因(shared gene),且可用於受測者群組的分類。我們將這些基因對進行彙整以形成基因簇(gene clusters),或許可用於釐清一些疾病機制。 p ( v ) is the Student's T test value between the case and the control group, and C th is a normal number. It was found that the gene pair (1) had a high correlation between the case group and the control group, and the z-score normalization method used in the pre-processing step allowed us to distinguish that the two interaction genes were highly positively correlated ( Synergy) or highly negative correlation (antagonism). In addition, many interaction gene pairs share a shared gene and can be used for classification of groups of subjects. We aggregate these gene pairs to form gene clusters, or permit them to clarify some disease mechanisms.

2-2 基於基因簇建構決策樹2-2 Building a Decision Tree Based on Gene Clusters

假設所建構的每一個基因簇代表一個已知的機制,因此下一步則將決定哪一個機制對於復發/非復發或者轉移/非轉移是最具影響力的。我們在此利用容易被詮釋的決策樹模型來建立預測模式。我們首先將基因簇中之基因對依顯著程度(sig_diff)進行排序。顯著程度之公式如下:Assuming that each gene cluster constructed represents a known mechanism, the next step will determine which mechanism is most influential for relapse/non-recurrence or metastasis/non-metastasis. Here we use the easy-to-interpret decision tree model to build prediction models. We first sort the gene pairs in the gene cluster according to the degree of significance ( sig _ diff ). The formula for the degree of significance is as follows:

sig_diff=log10(p(v A))+log10(p(v B))-log10(p(v A×v B)) (2) Sig _ diff =log 10 ( p ( v A ))+log 10 ( p ( v B ))-log 10 ( p ( v A × v B )) (2)

利用一基因簇中的前n個基因對之基因表現比率相乘的值加總,以代表其潛藏的機制。此作法是為維持高鑑別的分群效果,以降低單一基因對的非專一雜訊影響。在建構一決策樹時,高鑑別度之變數陸續被納入考慮,因此變數的排序會影響最終產生的決策樹結構。我們以基因簇的內含基因對數量將該些變數進行排序,因較大的基因簇可能在最終因素決策(預後類型,例如復發或轉移)之潛在的機制中扮演較重要的角色。之後利用不同程度的可重現性基因與基因簇中基因對之變數建構出多個決策樹。最後經由0.632靴帶法(bootstrap)[Efron B. Estimating the error rate of a prediction rule: improvement on cross-validation. J. Am. Stat. Assoc. 1983;78,316-331]決定出最終的決策樹,該0.632靴帶法為適合小樣本數之統計方法[Braga-Neto UM and Dougherty ER. Is cross-validation valid for small-sample microarray classification? Bioinformatics. 2004 Feb 12;20(3):374-80.]。用於決定每一基因簇中合適基因對,以及建構決策樹的演算法邏輯整理如下:The values of the ratios of the gene expression ratios of the first n genes in a gene cluster are summed to represent their hidden mechanisms. This approach is to maintain a high discrimination grouping effect to reduce the non-specific noise effects of a single gene pair. When constructing a decision tree, the variables of high discrimination are taken into consideration, so the ordering of variables will affect the resulting decision tree structure. We rank these variables by the number of genes contained in the gene cluster, as larger gene clusters may play a more important role in the underlying mechanism of decision making (prognostic types, such as recurrence or metastasis). Then, multiple decision trees are constructed by using different degrees of reproducible genes and gene pairs in the gene cluster. Finally, the final decision tree is determined by the 0.632 bootstrap [Efron B. Estimating the error rate of a prediction rule: improvement on cross-validation. J. Am. Stat. Assoc. 1983; 78, 316-331]. The 0.632 bootstrap method is a statistical method suitable for small sample numbers [Braga-Neto UM and Dougherty ER. Is cross-validation valid for small-sample microarray classification? Bioinformatics. 2004 Feb 12; 20(3): 374-80.]. The algorithmic logic for determining the appropriate gene pairs in each gene cluster and constructing the decision tree is as follows:

步驟1:將實施例1.2之晶片實驗經過實施例1.3之篩選而挑選出合格的23個病患之HOAv4與HOAv5晶片之基因表現比率結果,計算該基因表現比率於HOAv4與HOAv5之統計相關性,並挑選出相關性大於閾值(即corr_thres,在此為大於及等於0.4)之所有基因;Step 1: The wafer experiment of Example 1.2 was selected by screening in Example 1.3 to select the results of the gene expression ratio of the HOAv4 and HOAv5 wafers of the qualified 23 patients, and the statistical correlation between the gene expression ratio and HOAv4 and HOAv5 was calculated. And select all genes whose correlation is greater than the threshold (ie, corr_thres , here greater than or equal to 0.4);

步驟2:自訓練組的45個病患之HOAv4資料挑選出符合步驟1的合格基因群,並進一步挑選出符合式(1)之基因對;Step 2: Select the qualified gene group that meets step 1 from the HOAv4 data of 45 patients in the training group, and further select the gene pair that conforms to formula (1);

步驟3:以共享基因將基因對分組以建構基因簇。假設基因簇之數目共有noc個;Step 3: Group the gene pairs by sharing genes to construct a gene cluster. Assume that the number of gene clusters has a total of noc;

步驟3.1:將基因簇依內含的基因對數量多寡進行降冪排序,假設每一個基因簇裡的基因對數目為nopi,其中i為基因簇編號,i=1,2,...noc;Step 3.1: The gene clusters are ordered by the number of genes contained in the gene cluster, assuming that the number of gene pairs in each gene cluster is nopi, where i is the gene cluster number, i=1, 2, ... noc;

步驟3.2:將每一基因簇中的基因對依以式(2)所得之基因-基因交互作用顯著程度值進行降冪排序;Step 3.2: The gene in each gene cluster is subjected to a descending order of the gene-gene interaction significance degree obtained by the formula (2);

步驟4:從下列決策樹中挑選出平均期望誤差最小之一決策樹。Step 4: Pick a decision tree with one of the smallest average expected errors from the following decision trees.

步驟4.1:利用每一基因簇中前p(p=1~每個基因簇中的基因對最大數量)個基因對建構一輸入矩陣M;Step 4.1: construct an input matrix M by using the pre-p (p=1~the largest number of gene pairs in each gene cluster) gene pair in each gene cluster;

步驟4.1.1:計算陣列v(vector)資料,其中v=[v1,v2,...,vc,...,vnoc]T,而vc=[vc,1,vc,2,...,vc,i,...,vc,min(p,nopi)]T,vc,i=(某基因簇中第k組基因對的基因表現比率相乘)。依以上邏輯為每個病患建立一陣列v;Step 4.1.1: Calculate the array v(vector) data, where v=[v 1 ,v 2 ,...,v c ,...,v noc ] T , and v c =[v c,1 ,v c,2 ,...,v c,i ,...,v c,min(p,nopi) ] T ,v c,i = (The gene expression ratio of the kth gene pair in a gene cluster is multiplied). According to the above logic, an array v is established for each patient;

步驟4.1.2:將上述之陣列v進行直式排列以產生輸入矩陣M;Step 4.1.2: The above array v is arranged in a straight line to generate an input matrix M;

步驟4.2:利用矩陣M與預後類型分類建構一基因簇;Step 4.2: construct a gene cluster using the matrix M and the prognostic type classification;

步驟4.3:利用0.632靴帶法計算所建構決策樹之平均期望誤差。Step 4.3: Calculate the average expected error of the constructed decision tree using the 0.632 bootstrap method.

2-3 比例不均的病患族群比例處理2-3 proportion of patients with uneven proportion of patients

我們的訓練組中包含12位有轉移的患者以及33位未轉移的患者。這樣小樣本數且比例不均的病患族群比例(未轉移患者的數目幾乎是轉移患者數目的三倍),可能會造成基因對之誤選,及分類模型預測效果不佳[Gustavo E. A. P. A. Batista,Ronaldo C. Prati,and Maria Carolina Monard. A study of the behavior of several methods for balancing machine learning training data. In Special issue on learning from imbalanced datasets,volume 6,pages 20-29,2004.]。為降低基因對的誤選,在模式建立時使用隨機減少多數類別抽樣法(random under-sampling)[Ricardo Barandela,Rosa M. Valdovinos,J. Salvador Sanchez,and Francesc J. Ferri. The imbalanced training sample problem: Under or over sampling? Lecture Notes in Computer Science,3138:806-814,2004]。因此從原測試組資料中產生10組測試資料,每一測試資料含有12個轉移患者以及從原本33位未轉移患者中隨機選出12個未轉移患者。在每一測試資料中搜尋滿足前述式(1)的基因對,如該基因對在至少5組測試資料中曾出現則予以保留並用以建構基因簇。將每個合格基因對於10組測試資料中計算出之10個顯著程度數值(依前述式(2)之計算)進行中位數計算,即為該基因對之最終顯著程度數值。Our training group included 12 patients with metastases and 33 patients with no metastases. The proportion of patients with such small sample size and uneven proportion (the number of patients who have not metastasized is almost three times the number of patients transferred) may result in mismatching of gene pairs and poor prediction of classification models [Gustavo EAPA Batista, Ronaldo C. Prati, and Maria Carolina Monard. A study of the behavior of several methods for balancing machine learning training data. In Special issue on learning from imbalanced datasets, volume 6, pages 20-29, 2004.]. To reduce misclassification of gene pairs, random under-sampling is used in pattern establishment [Ricardo Barandela, Rosa M. Valdovinos, J. Salvador Sanchez, and Francesc J. Ferri. The imbalanced training sample problem : Under or over sampling? Lecture Notes in Computer Science, 3138: 806-814, 2004]. Therefore, 10 sets of test data were generated from the original test group data, each test data contained 12 transfer patients and 12 untransferred patients were randomly selected from the original 33 untransferred patients. A search for a gene pair satisfying the above formula (1) is searched for in each test data, and if the gene pair appears in at least five sets of test data, it is retained and used to construct a gene cluster. The median value of each of the qualified genes for the 10 significant degree values (calculated according to the above formula (2)) calculated from the 10 sets of test data is the final significant degree value of the pair.

除解決資料的比例不均之外,我們也調整訓練決策樹中的成本矩陣(cost matrix)[Japkowicz N,Class Imbalances: Are we Focusing on the Right Issue?,In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets,2003]以增進分類結果準確度。該成本矩陣為一個二維方陣M c ,其中M c (i,j)為將一受測者為類型i被分類進類型j的成本。在本預測模型中,該成本矩陣之設置為Mc=[0 33;12 0],因為在本案中的肝癌轉移分類的受測者比未轉移分類少,因此如果出現一轉移患者被分為未轉移時,會給予一懲罰權重,以降低此類分類錯誤機率。In addition to addressing the uneven proportion of data, we also adjust the cost matrix in the training decision tree [Japkowicz N, Class Imbalances: Are we Focusing on the Right Issue?, In Proceedings of the ICML'03 Workshop on Learning From Imbalanced Data Sets, 2003] to improve the accuracy of classification results. The cost matrix is a two-dimensional square matrix M c , where M c ( i , j ) is the cost of classifying a subject as type i into type j. In this prediction model, the cost matrix is set to Mc=[0 33; 12 0], because in this case, the liver cancer metastasis classification is less than the non-metastasis classification, so if a metastasis patient is divided into no When transferring, a penalty weight is given to reduce the probability of such classification errors.

實施例3:Example 3: 復發與非復發肝癌患者的分類與結果Classification and outcome of patients with recurrent and non-recurrent liver cancer 3-1 建構與選擇基因簇3-1 Construction and selection of gene clusters

為只將強交互作用的基因對納入考慮,本發明實施例中將式(1)的C th 設為1.5。表示二基因表現比率相乘的p值(Student’s t-test)至少比二基因表現比率的個別p值之相乘值小101.5=31.62倍。有2,416個基因對符合此設定。Gene is only taken into account for the strong interaction, C th embodiment in the formula (1) of the embodiment of the present invention is set to 1.5. The p- value indicating the multiplication of the two gene expression ratios (Student's t-test) is at least 10 1.5 = 31.62 times smaller than the multiplicative value of the individual p values of the two gene expression ratios. There are 2,416 gene pairs that match this setting.

3-2 建構復發與否之決策樹3-2 Decision Tree for Constructing Recurrence

根據0.632靴帶法算出的最小分類誤差所決定的決策樹如第一圖所示,基因簇中所選擇的基因如表五。在基因簇數目為541時(noc=541)選出最佳的決策樹,在做決策樹訓練時每個基因簇皆以前二個基因對(nop=2)輸入作計算。然而,最佳的決策樹只用了三個基因簇為共享基因:EXOC6(SEQ ID NO:1)、PDDC1(SEQ ID NO:4)與TCEAL1(SEQ ID NO:8)之基因簇。The decision tree determined by the minimum classification error calculated by the 0.632 bootstrap method is as shown in the first figure, and the selected genes in the gene cluster are shown in Table 5. When the number of gene clusters is 541 ( noc = 541), the best decision tree is selected. When doing the decision tree training, each gene cluster is calculated by the input of the previous two gene pairs ( nop = 2). However, the optimal decision tree uses only three gene clusters as shared genes: EXOC6 (SEQ ID NO: 1), PDDC1 (SEQ ID NO: 4) and TCEAL1 (SEQ ID NO: 8) gene clusters.

此決策樹在HOAv4訓練組的分類準確率為100%。將此決策樹應用於測試組後,其分類準確率為76.47%(13/17*100%),代表17位肝癌病患中有13位病患被準確的分類;敏感度為0.83(10/12),代表12位復發的肝癌病患有10位被正確的分類為復發;特異度為0.60(3/5),代表5位不復發的病患有3位被正確的分類為不復發。在訓練組以及測試組中,此決策樹皆有高度敏感度及特異度,可證實此決策樹為一有效預測肝癌復發或未復發的方法。The classification accuracy of this decision tree in the HOAv4 training group is 100%. After applying this decision tree to the test group, the classification accuracy rate was 76.47% (13/17*100%), representing 13 patients with 17 liver cancer patients were accurately classified; the sensitivity was 0.83 (10/) 12), 10 patients with 12 recurrent liver cancers were correctly classified as relapse; specificity was 0.60 (3/5), representing 3 patients with no recurrence and 3 were correctly classified as non-recurrent. In the training group and the test group, this decision tree has high sensitivity and specificity, which can be confirmed as a method for effectively predicting the recurrence or non-recurrence of liver cancer.

表五各基因之序列皆列於序列表:EXOC6(SEQ ID NO:1)、ZNF692(SEQ ID NO:2)、PTGDS(SEQ ID NO:3)、PDDC1(SEQ ID NO:4)、MYBL2(SEQ ID NO:5)、GNG13(SEQ ID NO:6)、TCEAL1(SEQ ID NO:8)、BANK1(SEQ ID NO:9)及Clorf54(SEQ ID NO:10)。The sequences of each gene in Table 5 are listed in the sequence listing: EXOC6 (SEQ ID NO: 1), ZNF692 (SEQ ID NO: 2), PTGDS (SEQ ID NO: 3), PDDC1 (SEQ ID NO: 4), MYBL2 ( SEQ ID NO: 5), GNG13 (SEQ ID NO: 6), TCEAL1 (SEQ ID NO: 8), BANK1 (SEQ ID NO: 9), and Clord54 (SEQ ID NO: 10).

由第一圖可看出由EXOC6、ZNF692、PTGDS所組成的第1節點以及由PDDC1、MYBL2以及GNG13組成的第2節點所做的分類中,在訓練組之預測準確率已達到94%,於測試組之預測準確率則為59%,在加入第3節點的分類後,於訓練組之預測準確率達100%,而測試組之預測準確率則為76.47%。From the first figure, it can be seen that the classification of the first node consisting of EXOC6, ZNF692, and PTGDS and the second node consisting of PDDC1, MYBL2, and GNG13 has a prediction accuracy of 94% in the training group. The prediction accuracy of the test group was 59%. After the classification of the third node, the prediction accuracy of the training group reached 100%, while the prediction accuracy of the test group was 76.47%.

另外,使用相同方法可得另一些決策樹,即表一所示之基因組成之決策樹,每一列代表一決策樹,該些基因之序列編號列於表二,經由表一所示之基因組合亦可得到不錯的預測肝癌復發風險結果。In addition, other decision trees can be obtained using the same method, that is, the decision tree of the gene composition shown in Table 1, each column represents a decision tree, and the sequence numbers of the genes are listed in Table 2, and the gene combinations shown in Table 1 are shown. Can also get a good prediction of the risk of liver cancer recurrence.

實施例4:Example 4: turn 移或未轉移肝癌患者的分類與結果Classification and results of patients with or without metastatic liver cancer 4-1 基因簇的選擇與建構4-1 Selection and construction of gene clusters

經過比例不均的訓練組資料調整以及式(1)之C th 設為1.5,在轉移資料中得到347個基因對。After adjusting the data of the training group with uneven proportion and the C th of the formula (1) was set to 1.5, 347 gene pairs were obtained in the transferred data.

4-2 所建構轉移與否之決策樹4-2 Decision tree for constructing transfer or not

根據0.632靴帶法算出的最小分類誤差所決定的決策樹如第二圖所示,基因簇中所選擇出的基因如表六。在基因簇數目為13時(noc=13)選出最佳的決策樹,在做決策樹訓練時每個基因簇皆以前二個基因對(nop=2)輸入作計算。然而,最佳的決策樹只用了二個基因簇為共享基因:BMP4(SEQ ID NO:10)與CYP2E1(SEQ ID NO:13)之基因簇。The decision tree determined by the minimum classification error calculated by the 0.632 bootstrap method is shown in the second figure, and the genes selected in the gene cluster are shown in Table 6. When the number of gene clusters is 13 ( noc = 13), the best decision tree is selected. When doing the decision tree training, each gene cluster is calculated by the input of the previous two gene pairs ( nop = 2). However, the best decision tree uses only two gene clusters as shared genes: BMP4 (SEQ ID NO: 10) and CYP2E1 (SEQ ID NO: 13).

此決策樹在HOAv4訓練組的分類準確率為98%(44/45*100%),代表45位肝癌病患中有44位病患被準確的分類;敏感度為0.92(11/12),代表12位轉移的肝癌病患有10位被正確的分類為轉移;特異度為1.0(33/33),代表33位不轉移的病患全部皆被正確的分類為不轉移。將此決策樹應用於測試組後,其分類準確率為82.61%(19/23*100%),代表23位肝癌病患中有19位病患被準確的分類;敏感度為0.40(2/5),代表5位轉移的肝癌病患有2位被正確的分類為轉移;特異度為0.94(17/18),代表18位不轉移的病患有17位被正確的分類為不轉移。第二圖中四位錯誤分類的患者(以灰底表示)中,包括二位男性轉移患者、一位女性轉移患者及一位女性未轉移患者。此外,一錯誤分類男性轉移患者為HBV感染之肝癌患者,其餘錯誤分類的患者皆為HCV感染肝癌患者。此決策樹在訓練組及測試組中具有高敏感度,故可為區分癌症患者轉移或未轉移的有效方式。The classification accuracy of this decision tree in the HOAv4 training group was 98% (44/45*100%), representing 44 patients with 45 liver cancer patients being accurately classified; the sensitivity was 0.92 (11/12). The 10 patients with liver cancer who represented 12 metastases were correctly classified as metastasis; the specificity was 1.0 (33/33), indicating that all 33 patients who did not metastasize were correctly classified as non-metastasis. After applying this decision tree to the test group, the classification accuracy rate was 82.61% (19/23*100%), representing 19 patients with 23 liver cancer patients being accurately classified; the sensitivity was 0.40 (2/). 5), 2 patients with liver cancer representing 5 metastases were correctly classified as metastasis; specificity was 0.94 (17/18), representing 17 patients with no metastasis were correctly classified as non-metastasis. In the second picture, four misclassified patients (represented by gray bottom) included two male metastatic patients, one female metastatic patient, and one female non-metastatic patient. In addition, a misclassified male metastatic patient is a liver cancer patient with HBV infection, and the remaining misclassified patients are HCV-infected liver cancer patients. This decision tree has high sensitivity in the training and test groups, so it can be an effective way to distinguish between cancer patients with or without metastasis.

表六各基因之序列皆列於序列表:BMP4(SEQ ID NO:10)、C5orf32(SEQ ID NO:11)、DTX3(SEQ ID NO:12)、CYP2E1(SEQ ID NO:13)、YARS(SEQ ID NO:14)及CLDN4(SEQ ID NO:15)。The sequences of each gene in Table 6 are listed in the sequence listing: BMP4 (SEQ ID NO: 10), C5orf32 (SEQ ID NO: 11), DTX3 (SEQ ID NO: 12), CYP2E1 (SEQ ID NO: 13), YARS ( SEQ ID NO: 14) and CLDN4 (SEQ ID NO: 15).

由第二圖可看出由BMP4C5orf32DTX3所組成的第1節點,在訓練組之預測準確率已達到91%,於測試組之預測準確率則為74%,在加入第2節點的分類後,於訓練組之預測準確率達98%,而測試組之預測準確率則為82.61%。It can be seen from the second figure that the first node consisting of BMP4 , C5orf32 , and DTX3 has a prediction accuracy of 91% in the training group and 74% in the test group. After classification, the prediction accuracy rate in the training group was 98%, while the prediction accuracy of the test group was 82.61%.

另外,使用相同方法可得另一些決策樹,即表三所示之基因組成之決策樹,每一列代表一決策樹,該些基因之序列編號列於表四,經由表三所示之基因組合亦可得到不錯的預測肝癌轉移風險結果。In addition, the same method can be used to obtain other decision trees, namely the decision tree of the gene composition shown in Table 3. Each column represents a decision tree. The sequence numbers of these genes are listed in Table 4, and the gene combinations shown in Table 3 are shown. Can also get a good prediction of liver cancer metastasis risk results.

由上述實施例可知,本發明方法由訓練組計算出本發明預測肝癌患者術後預後之基因組合及方法,其中在預測肝癌患者術後復發之準確率為76.47%,預測肝癌患者術後轉移之準確率為82.61%,此結果顯示,經由本發明之基因組合,配合所得決策樹之分類,可提供一預測肝癌患者術後復發或轉移風險之有效方法。It can be seen from the above embodiments that the method of the present invention calculates the gene combination and method for predicting the prognosis of liver cancer patients by the training group, wherein the accuracy of predicting the recurrence of liver cancer patients is 76.47%, and predicting the postoperative metastasis of liver cancer patients. The accuracy rate is 82.61%. This result shows that the combination of the resulting decision tree with the gene combination of the present invention can provide an effective method for predicting the risk of postoperative recurrence or metastasis of liver cancer patients.

另外,經由上述所得到之基因組合,可利用習知技術製成生物晶片或微陣列分析,並使用該生物晶片或微陣列分析之試驗結果預測肝癌患者術後復發或轉移風險。In addition, biochip or microarray analysis can be made by the above-mentioned obtained gene combination using conventional techniques, and the test results of the biochip or microarray analysis can be used to predict the risk of postoperative recurrence or metastasis of liver cancer patients.

雖本發明之實施例揭露如上所述,然並非用以限定本發明之範圍,任何熟習此技藝者,在不脫離本發明之精神和範圍內,當可做些許更動與潤飾,因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。While the embodiments of the present invention are disclosed as described above, it is not intended to limit the scope of the present invention, and those skilled in the art can make some modifications and retouchings without departing from the spirit and scope of the present invention. The scope of protection is subject to the definition of the scope of the patent application attached.

<110> 華聯生物科技股份有限公司<110> Hualian Biotechnology Co., Ltd.

<120> 一種預測癌患者術後復發或轉移風險之基因組合及方法<120> A genetic combination and method for predicting the risk of postoperative recurrence or metastasis in cancer patients

<160> 57<160> 57

<210> 1<210> 1

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> EXOC6 <223> EXOC6

<400> 1<400> 1

<210> 2<210> 2

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> ZNF692 <223> ZNF692

<400> 2<400> 2

<210> 3<210> 3

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> PTGDS <223> PTGDS

<400> 3<400> 3

<210> 4<210> 4

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> PDDC1 <223> PDDC1

<400> 4<400> 4

<210> 5<210> 5

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> MYBL2 <223> MYBL2

<400> 5<400> 5

<210> 6<210> 6

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> GNG13 <223> GNG13

<400> 6<400> 6

<210> 7<210> 7

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> ROBO3 <223> ROBO3

<400> 7<400> 7

<210> 8<210> 8

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> TCEAL1 <223> TCEAL1

<400> 8<400> 8

<210> 9<210> 9

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> BANK1 <223> BANK1

<400> 9<400> 9

<210> 10<210> 10

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> C1orf54 <223> C1orf54

<400> 10<400> 10

<210> 11<210> 11

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> BANP <223> BANP

<400> 11<400> 11

<210> 12<210> 12

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> LRRC14 <223> LRRC14

<400> 12<400> 12

<210> 13<210> 13

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> LOC730427 <223> LOC730427

<400> 13<400> 13

<210> 14<210> 14

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> ELOVL7 <223> ELOVL7

<400> 14<400> 14

<210> 15<210> 15

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> KIAA1522 <223> KIAA1522

<400> 15<400> 15

<210> 16<210> 16

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> EVI2B <223> EVI2B

<400> 16<400> 16

<210> 17<210> 17

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> MBOAT1 <223> MBOAT1

<400> 17<400> 17

<210> 18<210> 18

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> CENPO <223> CENPO

<400> 18<400> 18

<210> 19<210> 19

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> LMBR1L <223> LMBR1L

<400> 19<400> 19

<210> 20<210> 20

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> SPOCK2 <223> SPOCK2

<400> 20<400> 20

<210> 21<210> 21

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> GNL3L <223> GNL3L

<400> 21<400> 21

<210> 22<210> 22

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> DCTN5 <223> DCTN5

<400> 22<400> 22

<210> 23<210> 23

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> ZNF8 <223> ZNF8

<400> 23<400> 23

<210> 24<210> 24

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> STK32C <223> STK32C

<400> 24<400> 24

<210> 25<210> 25

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> ZNF480 <223> ZNF480

<400> 25<400> 25

<210> 26<210> 26

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> CHAF1A <223> CHAF1A

<400> 26<400> 26

<210> 27<210> 27

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> PPP1R13L <223> PPP1R13L

<400> 27<400> 27

<210> 28<210> 28

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> DTX3 <223> DTX3

<400> 28<400> 28

<210> 29<210> 29

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> CDCA4 <223> CDCA4

<400> 29<400> 29

<210> 30<210> 30

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> NCOR2 <223> NCOR2

<400> 30<400> 30

<210> 31<210> 31

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> SLAMF8 <223> SLAMF8

<400> 31<400> 31

<210> 32<210> 32

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> PTPRO <223> PTPRO

<400> 32<400> 32

<210> 33<210> 33

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> DYNC2H1 <223> DYNC2H1

<400> 33<400> 33

<210> 34<210> 34

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> BMP4 <223> BMP4

<400> 34<400> 34

<210> 35<210> 35

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> C5orf32 <223> C5orf32

<400> 35<400> 35

<210> 36<210> 36

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> CYP2E1 <223> CYP2E1

<400> 36<400> 36

<210> 37<210> 37

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> YARS <223> YARS

<400> 37<400> 37

<210> 38<210> 38

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> CLDN4 <223> CLDN4

<400> 38<400> 38

<210> 39<210> 39

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> WISP2 <223> WISP2

<400> 39<400> 39

<210> 40<210> 40

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> LIMK2 <223> LIMK2

<400> 40<400> 40

<210> 41<210> 41

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> GAL3ST4 <223> GAL3ST4

<400> 41<400> 41

<210> 42<210> 42

<211> 60<211> 60

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<220><220>

<223> AK1 <223> AK1

<400> 42<400> 42

第一圖係為以肝癌復發或未復發分類所產生的決策樹,以及利用此決策樹將訓練組及測試組中患者分類的結果。The first graph is a decision tree generated by recurrence or non-recurrence classification of liver cancer, and the results of classifying patients in the training group and the test group using this decision tree.

第二圖係為以肝癌轉移或未轉移分類所產生的決策樹,以及利用此決策樹將訓練組及測試組中患者分類的結果。The second graph is the decision tree generated by the liver cancer metastasis or non-metastasis classification, and the results of classifying the patients in the training group and the test group using this decision tree.

Claims (25)

一種預測肝癌患者術後復發風險之基因組合,係為SEQ ID NO:1~33所示之核酸序列中至少二個基因之任一組合。A gene combination for predicting the risk of postoperative recurrence of a liver cancer patient is any combination of at least two of the nucleic acid sequences shown in SEQ ID NOS: 1 to 33. 如申請專利範圍第1項所述之基因組合,其中該基因組合係為SEQ ID NO:1~6,8~10所示之核酸序列。The gene combination according to claim 1, wherein the gene combination is the nucleic acid sequence shown in SEQ ID NOS: 1 to 6, 8 to 10. 如申請專利範圍第1項所述之基因組合,係用於生物晶片、微陣列或即時定量聚合酶技術分析。The combination of genes as described in claim 1 is for biochip, microarray or real-time quantitative polymerase technology analysis. 一種自肝癌組織預測肝癌患者術後復發風險之方法,係包含:(1) 取得一肝癌患者之一腫瘤組織及一周圍正常組織;(2) 分析該腫瘤組織及該周圍正常組織之一第一基因組合及一第二基因組合,其中,該第一基因組合係由SEQ ID NO:1~3所示之核酸序列所組成,以及該第二基因組合係為SEQ ID NO:4~6所示之核酸序列所構成之基因組合II(a),或SEQ ID NO:4~5,7所示之核酸序列所構成之基因組合II(b),將該些基因組合中之基因於腫瘤組織及周圍正常組織的基因表現強度相除,得到該些基因組合內各基因的表現比率,其中SEQ ID NO:1及4所示之核酸序列為共享基因,與該基因組合內其餘基因計算得一第一基因組合交互作用值及一第二基因組合交互作用值;以及(3)當該第一基因組合交互作用值低於一第一門檻值且該第二基因組合交互作用值高於或等於一第二門檻值時,判定為高復發風險之患者,其中,該第一基因組合交互作用值為SEQ ID NO:1與SEQ ID NO:2所示之核酸序列之基因表現比率相乘再加上SEQ ID NO:1與SEQ ID NO:3所示之核酸序列之基因表現比率相乘結果,該第二基因組合II(a)的交互作用值為SEQ ID NO:4與SEQ ID NO:5所示之核酸序列之基因表現比率相乘再加上SEQ ID NO:4與SEQ ID NO:6所示之核酸序列之基因表現比率相乘結果,該第二基因組合II(b)的交互作用值為SEQ ID NO:4與SEQ ID NO:5所示之核酸序列之基因表現比率相乘再加上SEQ ID NO:4與SEQ ID NO:7所示之核酸序列之基因表現比率相乘結果。A method for predicting the risk of postoperative recurrence of liver cancer patients from a liver cancer tissue comprises: (1) obtaining a tumor tissue and a surrounding normal tissue of a liver cancer patient; (2) analyzing the tumor tissue and one of the surrounding normal tissues. a combination of a gene and a second gene combination, wherein the first gene combination is composed of the nucleic acid sequences shown in SEQ ID NOS: 1-3, and the second gene combination is represented by SEQ ID NOS: 4-6 a gene combination II(a) consisting of the nucleic acid sequence, or a gene combination II(b) consisting of the nucleic acid sequences shown in SEQ ID NOS: 4 to 5, 7, and the genes in the combination of the genes are in tumor tissues and The gene expression intensity of the surrounding normal tissues is divided, and the expression ratio of each gene in the combination of the genes is obtained, wherein the nucleic acid sequences shown in SEQ ID NOS: 1 and 4 are shared genes, and the remaining genes in the combination of the genes are calculated. a gene combination interaction value and a second gene combination interaction value; and (3) when the first gene combination interaction value is lower than a first threshold value and the second gene combination interaction value is higher than or equal to one When the second threshold is depreciated, A patient identified as having a high risk of recurrence, wherein the first gene combination interaction value is multiplied by the gene expression ratio of the nucleic acid sequence shown in SEQ ID NO: 1 and SEQ ID NO: 2 plus SEQ ID NO: 1 and The gene expression ratio multiplication result of the nucleic acid sequence shown by SEQ ID NO: 3, the interaction value of the second gene combination II (a) is the gene of the nucleic acid sequence shown by SEQ ID NO: 4 and SEQ ID NO: 5. The performance ratio is multiplied together with the gene expression ratio of the nucleic acid sequence shown in SEQ ID NO: 4 and SEQ ID NO: 6, and the interaction value of the second gene combination II (b) is SEQ ID NO: 4 The gene expression ratio of the nucleic acid sequence shown in SEQ ID NO: 5 is multiplied by the gene expression ratio of the nucleic acid sequence shown in SEQ ID NO: 4 and SEQ ID NO: 7. 如申請專利範圍第4項所述之方法,其中該步驟(2)若選擇基因組合II(a)進行分析,則進一步分析一第三基因組合,該第三基因組合係為:SEQ ID NO:8~10所示之核酸序列構成之基因組合III(a)、SEQ ID NO:13~15所示之核酸序列構成之基因組合III(b)、SEQ ID NO:19及12所示之核酸序列構成之基因組合III(c)、SEQ ID NO:20~21所示之核酸序列構成之基因組合III(d)、SEQ ID NO:22~24所示之核酸序列構成之基因組合III(e)、或SEQ ID NO:32~33所示之核酸序列構成之基因組合III(f),其中SEQ ID NO:8、13、19、20、22及32所示之核酸序列為共享基因,與該基因組合內其餘基因計算得一第三基因組合交互作用值;當該第一基因組合交互作用值高於或等於一第一門檻值,且該第三基因組合III(a)交互作用值高於或等於第三門檻值或第三基因組合III(b)~III(f)交互作用值低於第三門檻值時,則判定為高復發風險之患者。The method of claim 4, wherein the step (2), if the gene combination II (a) is selected for analysis, further analyzing a third gene combination, the third gene combination is: SEQ ID NO: The nucleic acid sequence consisting of the nucleic acid sequence represented by the nucleic acid sequence shown in 8 to 10, the nucleic acid sequence represented by the nucleic acid sequence shown by SEQ ID NOS: 13 to 15, and the nucleic acid sequence shown in SEQ ID NO: 19 and Gene combination III(d) composed of the nucleic acid sequence shown in the gene combination III(c) and SEQ ID NOS: 20 to 21, and the gene combination III(e) composed of the nucleic acid sequences shown in SEQ ID NOS: 22-24 Or the gene combination III(f) consisting of the nucleic acid sequences shown in SEQ ID NOS: 32 to 33, wherein the nucleic acid sequences shown in SEQ ID NOS: 8, 13, 19, 20, 22 and 32 are shared genes, and The third gene combination interaction value is calculated by the remaining genes in the gene combination; when the first gene combination interaction value is higher than or equal to a first threshold value, and the third gene combination III(a) interaction value is higher than Or equal to the third threshold or the third gene combination III(b)~III(f) interaction value is lower than the third threshold, then it is judged as high recurrence Patient of risk. 如申請專利範圍第4項所述之方法,其中該步驟(2)若選擇基因組合II(b)進行分析,則進一步分析一第四基因組合,該第四基因組合係為:SEQ ID NO:11~12所示之核酸序列構成之基因組合IV(a)、SEQ ID NO:16~18所示之核酸序列構成之基因組合IV(b)、SEQ ID NO:25~27所示之核酸序列構成之基因組合IV(c)、SEQ ID NO:28~30所示之核酸序列構成之基因組合IV(d)、或SEQ ID NO:31及2所示之核酸序列構成之基因組合IV(e),其中SEQ ID NO:11,16,25,28及31所示之核酸序列為共享基因,與該基因組合內其餘基因計算得一第四基因組合交互作用值;當該第一基因組合交互作用值高於或等於一第一門檻值,且該第四基因組合交互作用值低於第四門檻值時,則判定為高復發風險之患者。The method of claim 4, wherein the step (2) if the gene combination II (b) is selected for analysis, further analyzing a fourth gene combination, the fourth gene combination is: SEQ ID NO: The nucleic acid sequence consisting of the nucleic acid sequence shown in 11 to 12, the nucleic acid sequence shown in the nucleic acid sequence shown in SEQ ID NO: 16 to 18, and the nucleic acid sequence shown in SEQ ID NO: 25 to 27. Gene combination IV(d) composed of the gene combination IV(c), the nucleic acid sequence shown in SEQ ID NOS: 28 to 30, or the gene combination IV (e) composed of the nucleic acid sequences shown in SEQ ID NOS: 31 and 2 Wherein the nucleic acid sequences set forth in SEQ ID NOS: 11, 16, 25, 28 and 31 are shared genes, and a fourth gene combination interaction value is calculated from the remaining genes in the gene combination; when the first gene combination is interacted When the action value is higher than or equal to a first threshold value and the fourth gene combination interaction value is lower than the fourth threshold value, the patient is determined to have a high risk of recurrence. 如申請專利範圍第4至6項中任一項所述之方法,其中該基因組合之基因表現強度係根據自各基因所轉錄之mRNA量或自該基因所轉譯之多肽量測定之。The method according to any one of claims 4 to 6, wherein the gene expression intensity of the gene combination is determined based on the amount of mRNA transcribed from each gene or the amount of polypeptide translated from the gene. 如申請專利範圍第4至6項中任一項所述之方法,其中該肝癌組織係擇自於手術切除癌組織或經由粗針切片取得。The method of any one of claims 4 to 6, wherein the liver cancer tissue is selected from surgically resected cancer tissue or obtained by thick needle sectioning. 一種預測肝癌患者術後轉移風險之基因組合,係為SEQ ID NO:28,34~42所示之核酸序列中至少二個基因之任一組合。A combination of genes for predicting the risk of postoperative metastasis in a liver cancer patient is any combination of at least two of the nucleic acid sequences set forth in SEQ ID NO: 28, 34-42. 如申請專利範圍第9項所述之基因組合,其中該基因組合係為SEQ ID NO:28,34~38所示之核酸序列。The gene combination according to claim 9, wherein the gene combination is the nucleic acid sequence shown in SEQ ID NO: 28, 34 to 38. 如申請專利範圍第9項所述之基因組合,係用於生物晶片、微陣列或即時定量聚合酶技術分析。The combination of genes as described in claim 9 is for biochip, microarray or real-time quantitative polymerase technology analysis. 一種自肝癌組織預測肝癌患者術後轉移風險的方法,係包含:(1) 取得一肝癌患者之一腫瘤組織及一周圍正常組織;(2) 分析該腫瘤組織及該周圍正常組織之一第五基因組合係由SEQ ID NO:28,34~35所示之核酸序列所組成,將該第五基因組合中之基因於腫瘤組織及周圍正常組織的基因表現強度相除,得到各基因的基因表現比率,其中SEQ ID NO:34所示之核酸序列為共享基因,與SEQ ID NO:35及28所示之核酸序列計算得一第五基因組合交互作用值;以及(3) 當該第五基因組合交互作用值低於一第五門檻值,判定為高轉移風險之患者;其中,該第五基因組合交互作用值為SEQ ID NO:34與SEQ ID NO:35所示之核酸序列之基因表現比率相乘再加上SEQ ID NO:34與SEQ ID NO:28所示之核酸序列之基因表現比率相乘結果。A method for predicting the risk of postoperative metastasis of a liver cancer patient from a liver cancer tissue comprises: (1) obtaining a tumor tissue and a surrounding normal tissue of a liver cancer patient; (2) analyzing the tumor tissue and one of the surrounding normal tissues. The gene combination is composed of the nucleic acid sequences shown in SEQ ID NO: 28, 34 to 35, and the gene expression in the fifth gene combination is divided by the gene expression intensity of the tumor tissue and surrounding normal tissues to obtain the gene expression of each gene. a ratio, wherein the nucleic acid sequence set forth in SEQ ID NO: 34 is a shared gene, a fifth gene combination interaction value is calculated with the nucleic acid sequences set forth in SEQ ID NOS: 35 and 28; and (3) when the fifth gene is The combination interaction value is lower than a fifth threshold value, and is determined to be a patient with high risk of metastasis; wherein the fifth gene combination interaction value is a gene expression of the nucleic acid sequence shown by SEQ ID NO: 34 and SEQ ID NO: The ratio is multiplied together with the result of multiplication of the gene expression ratio of the nucleic acid sequence shown by SEQ ID NO: 34 and SEQ ID NO: 28. 如申請專利範圍第12項所述之方法,其中該步驟(2)進一步分析一第六基因組合,該第六基因組合係為:SEQ ID NO:36~38所示之核酸序列構成之基因組合VI(a)、SEQ ID NO:39~40所示之核酸序列構成之基因組合VI(b)、或SEQ ID NO:41~42所示之核酸序列構成之基因組合VI(c),其中SEQ ID NO:36,39及41所示之核酸序列為共享基因,與該基因組合內其餘基因計算得一第六基因組合交互作用值;當該第五基因組合交互作用值高於或等於一第五門檻值且該第六基因組合交互作用值低於一第六門檻值時,則判定為高轉移風險之患者。The method of claim 12, wherein the step (2) further analyzes a sixth gene combination, wherein the sixth gene combination is a combination of the nucleic acid sequences represented by the SEQ ID NOS: 36-38 Gene combination VI(c) consisting of the nucleic acid sequence represented by VI(a), SEQ ID NO: 39-40, or the nucleic acid sequence represented by SEQ ID NOS: 41-42, wherein SEQ ID NO: The nucleic acid sequence shown in 36, 39 and 41 is a shared gene, and a sixth gene combination interaction value is calculated from the remaining genes in the gene combination; when the fifth gene combination interaction value is higher than or equal to one When the five thresholds are depreciated and the sixth gene combination interaction value is lower than a sixth threshold, the patient is determined to have a high risk of metastasis. 如申請專利範圍第12至13項中任一項所述之方法,其中該基因組合之表現量係根據自各基因所轉錄之mRNA量或自該基因所轉譯之多肽量測定之。The method of any one of claims 12 to 13, wherein the expression amount of the gene combination is determined based on the amount of mRNA transcribed from each gene or the amount of polypeptide translated from the gene. 如申請專利範圍第12至13項中任一項所述之方法,其中該肝癌組織係擇自手術切除癌組織或經由粗針切片取得。The method of any one of claims 12 to 13, wherein the liver cancer tissue is selected from a surgically resected cancer tissue or obtained by a thick needle section. 一種獲得預測癌症患者術後預後之基因的方法,係包含:(1) 取得複數個癌症患者之一摘取組織樣本;以及(2) 對於該摘取組織樣本的每一基因分別得到一基因表現數值(gene expression value);(3) 依一預後類型,選擇基因-基因交互作用強的複數個基因對,並依複數個共享基因將該些基因分類為複數個基因簇(gene clusters),其中該些共享基因係為該些基因對間相同的基因;以及(4) 在各該基因簇中,依基因交互作用之顯著程度依序累加至少一組基因對之基因表現數值相乘值,再依該預後類型,利用決策樹模型產生預測該預後類型之基因。A method for obtaining a gene predicting prognosis in a cancer patient, comprising: (1) obtaining a tissue sample from one of a plurality of cancer patients; and (2) obtaining a gene expression for each gene of the extracted tissue sample Gene expression value; (3) According to a prognostic type, select a plurality of gene pairs with strong gene-gene interaction, and classify the genes into a plurality of gene clusters according to a plurality of shared genes, wherein The shared gene lines are the same genes among the pair of genes; and (4) in each of the gene clusters, the gene multiplication values of at least one set of genes are sequentially added according to the significant degree of gene interaction, and then Based on this prognostic type, a decision tree model is used to generate genes that predict the type of prognosis. 如申請專利範圍第16項所述之方法,其中該預後類型為復發或轉移。The method of claim 16, wherein the prognosis type is relapse or metastasis. 如申請專利範圍第16項所述之方法,其中該摘取組織係擇自手術切除癌組織或經由粗針切片取得。The method of claim 16, wherein the extracting tissue is selected from a surgically resected cancer tissue or obtained by a thick needle section. 如申請專利範圍第16項所述之方法,其中該癌症為肝癌。The method of claim 16, wherein the cancer is liver cancer. 如申請專利範圍第16項所述之方法,其中該摘取組織係擇自腫瘤組織或/與該周圍正常組織。The method of claim 16, wherein the extracted tissue is selected from the tumor tissue or/and the surrounding normal tissue. 如申請專利範圍第16項所述之方法,其中該基因表現數值,為單一組織之單一基因表現強度(gene expression intensity),或是一腫瘤組織與該周圍正常組織之該基因表現強度相除之基因表現比率(gene expression ratio)。The method of claim 16, wherein the gene exhibits a numerical value, a single gene expression intensity of a single tissue, or a tumor tissue is divided by the intensity of the gene of the surrounding normal tissue. Gene expression ratio. 如申請專利範圍第16項所述之方法,其中步驟(2)進一步包含一數據標準化(standardization)處理,該數據標準化係為z-分數正規化(z-score normalization)。The method of claim 16, wherein the step (2) further comprises a data standardization process, the data normalization being z-score normalization. 如申請專利範圍第16項所述之方法,其中步驟(3)之該基因-基因交互作用強的基因對係為滿足式(1)的基因:-log10(p(v A×v B)) C th -log10(p(v A))-log10(p(v B)) (1)其中,該C th 值係大於或等於1.5。The method of claim 16, wherein the gene-gene interaction of the step (3) is a gene that satisfies the formula (1): -log 10 ( p ( v A × v B ) ) C th -log 10 ( p ( v A ))-log 10 ( p ( v B )) (1) wherein the C th value is greater than or equal to 1.5. 如申請專利範圍第16項所述之方法,其中步驟(4)之該交互作用顯著程度係利用式(2)計算出:sig_diff=log10(p(v A))+log10(p(v B))-log10(p(v A×v B)) (2)。The method of claim 16, wherein the degree of interaction of the step (4) is calculated using the formula (2): sig_diff = log 10 ( p ( v A )) + log 10 ( p ( v B ))-log 10 ( p ( v A × v B )) (2). 如申請專利範圍第16項所述之方法,其中步驟(4)進一步使用0.632靴帶法(bootstrap)選擇適當的決策樹。The method of claim 16, wherein the step (4) further selects an appropriate decision tree using a 0.632 bootstrap.
TW101110194A 2012-03-23 2012-03-23 A genetic combination and method for predicting the risk of recurrence or metastasis in cancer patients TWI450968B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW101110194A TWI450968B (en) 2012-03-23 2012-03-23 A genetic combination and method for predicting the risk of recurrence or metastasis in cancer patients

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW101110194A TWI450968B (en) 2012-03-23 2012-03-23 A genetic combination and method for predicting the risk of recurrence or metastasis in cancer patients

Publications (2)

Publication Number Publication Date
TW201339310A true TW201339310A (en) 2013-10-01
TWI450968B TWI450968B (en) 2014-09-01

Family

ID=49770775

Family Applications (1)

Application Number Title Priority Date Filing Date
TW101110194A TWI450968B (en) 2012-03-23 2012-03-23 A genetic combination and method for predicting the risk of recurrence or metastasis in cancer patients

Country Status (1)

Country Link
TW (1) TWI450968B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI630501B (en) * 2016-07-29 2018-07-21 長庚醫療財團法人林口長庚紀念醫院 Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set
WO2023034955A1 (en) * 2021-09-02 2023-03-09 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Machine learning-based systems and methods for predicting liver cancer recurrence in liver transplant patients
TWI839307B (en) * 2023-05-06 2024-04-11 華聯生物科技股份有限公司 Methods of estimating disease progression and prognosis after treatment in liver cancer patients with a computer

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1661991A4 (en) * 2003-08-24 2007-10-10 Univ Nihon Hepatocellular cancer-associated gene

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI630501B (en) * 2016-07-29 2018-07-21 長庚醫療財團法人林口長庚紀念醫院 Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set
WO2023034955A1 (en) * 2021-09-02 2023-03-09 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Machine learning-based systems and methods for predicting liver cancer recurrence in liver transplant patients
TWI839307B (en) * 2023-05-06 2024-04-11 華聯生物科技股份有限公司 Methods of estimating disease progression and prognosis after treatment in liver cancer patients with a computer

Also Published As

Publication number Publication date
TWI450968B (en) 2014-09-01

Similar Documents

Publication Publication Date Title
TWI822789B (en) Convolutional neural network systems and methods for data classification
Hayes et al. Gene expression profiling reveals reproducible human lung adenocarcinoma subtypes in multiple independent patient cohorts
US20090319244A1 (en) Binary prediction tree modeling with many predictors and its uses in clinical and genomic applications
CN108064311A (en) Medical prognosis and prediction using the therapeutic response of various kinds of cell signal transduction path activity
CN108064380A (en) Use the prediction of the medical prognosis and therapeutic response of various kinds of cell signal transduction path activity
CN106795565A (en) Method for assessing lung cancer status
US20220392640A1 (en) Systems and methods for predicting therapeutic sensitivity
Lin et al. Evolutionary route of nasopharyngeal carcinoma metastasis and its clinical significance
WO2020034543A1 (en) Marker for breast cancer diagnosis and screening method therefor
CN114164276B (en) Kit, device and method for lung cancer diagnosis
TWI450968B (en) A genetic combination and method for predicting the risk of recurrence or metastasis in cancer patients
US20240371520A1 (en) Prediction of BRCAness/Homologous Recombination Deficiency of Breast Tumors on Digitalized Slides
CN108350507B (en) Methods for histological diagnosis and treatment of disease
Liu et al. Survival time prediction of breast cancer patients using feature selection algorithm crystall
Park et al. Intraoperative diagnosis support tool for serous ovarian tumors based on microarray data using multicategory machine learning
Akter et al. A data mining approach for biomarker discovery using transcriptomics in endometriosis
CN117038067A (en) Neuroendocrine type prostate cancer risk prediction method and application thereof
CN110770849A (en) PRAEGNANT prognostic indicators of poor outcome in a group of metastatic breast cancers
Yu et al. Insulin-like growth factor binding protein 2: a core biomarker of left ventricular dysfunction in dilated cardiomyopathy
CN111172285A (en) miRNA group for early diagnosis and/or prognosis monitoring of pancreatic cancer and application thereof
Yu et al. Identification of PHACTR4 as A New Biomarker for Diabetic Nephropathy and Its Correlation with Glomerular Endothelial Dysfunction and Immune Infiltration.
CN116312814B (en) Construction method, equipment, device and kit of lung adenocarcinoma molecular typing model
Tchagang et al. Group-biomarkers identification in ovarian carcinoma
Lai et al. Screening Model for Bladder Cancer Early Detection With Serum miRNAs Based on Machine Learning: A Mixed‐Cohort Study Based on 16,189 Participants
Qi et al. Exploring the predictive values of SERP4 and FRZB in dilated cardiomyopathy based on an integrated analysis