[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3386052.3386077acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbbbConference Proceedingsconference-collections
research-article

Identification of the Association between Hepatitis B Virus and Liver Cancer using Machine Learning Approaches based on Amino Acid

Published: 18 May 2020 Publication History

Abstract

Primary liver cancer has been a common reason for death from cancer globally. The most common type of primary liver cancer is the hepatocellular carcinoma (HCC). The major cause of HCC is chronic infections with hepatitis B virus (HBV). In this research, we used next generation sequencing (NGS), which has been very widely used to produce deep, efficient, and high-quality sequence data. NGS was used to sequence the pre-S region of the HBV genome of total 139 patients, which contain 94 HCC patients and 45 chronic HBV (CHB) patients. We generated two types of datasets. Firstly, for the data of amino acid occurrence frequency, we used basic local alignment search tool (BLAST) to map each NGS short read and translated each alignment into amino acid by DNA codon table. The input features are the occurrence frequencies of 20 basic amino acids using Shannon entropy. We picked 40 patients with 27 HCC and 13 CHB as the independent testing set. Then we used machine learning methods including logistic regression, random forest and support vector machine (SVM) to construct the classification models and make the prediction. The AUC values on the independent testing set for those machine learning methods (logistic regression, random forest and SVM) are 0.946, 0.923 and 0.960 respectively. Secondly, for the data of word pattern frequency of amino acids, we calculated word pattern frequencies of amino acids of all individuals and compared them using Euclidean distance. The input features are the frequencies of amino acid word of length 2, which is normalized by dividing the total occurrence number of all words. What's more, word pattern frequencies of amino acids were used to construct the classification models for HCC status using machine learning methods. Principal coordinate analysis (PCoA) was also used to visualize the associations between patient clusters, the HCC disease status of patients, and the fraction of HBV genotypes. We found that word patterns are powerful for the analysis of the HBV sequences from the aspect of amino acids because the AUC values of the classification models for machine learning methods are all above 0.9. Hence, our study showed that word pattern frequencies of amino acids is powerful for revealing the underlying principles of the occurrence of HCC triggered by HBV. Our essential findings consist of three parts. Firstly, all machine learning methods can generate classification models with high AUC values. Then, we can find some certain positions of amino acids or word patterns of amino acids that the mutation occurred on those positions will induce the HCC. Last, PCoA is associated with the disease status (HCC or CHB) and the fraction of genotype B (or C).

References

[1]
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403--410.
[2]
Anderson, M. J., Willis, T. J. (2003). Canonical analysis of principal coordinates: a useful method of constrained ordination for ecology. Ecology, 84(2), 511--525.
[3]
Bai, X., Jia, J. A., Fang, M., Chen, S., Liang, X., Zhu, S., ... Gao, C. (2018). Deep sequencing of HBV pre-S region reveals high heterogeneity of HBV genotypes and associations of word pattern frequencies with HCC. PLoS genetics, 14(2), e1007206.
[4]
Beasley, R. P. (1988). Hepatitis B virus. The major etiology of hepatocellular carcinoma. Cancer, 61(10), 1942--1956.
[5]
Behjati, S., Tarpey, P. S. (2013). What is next generation sequencing?. Archives of Disease in Childhood-Education and Practice, 98(6), 236--238.
[6]
Chen, C. H., Hung, C. H., Lee, C. M., Hu, T. H., Wang, J. H., Wang, J. C., ... Changchien, C. S. (2007). Pre-S deletion and complex mutations of hepatitis B virus related to advanced liver disease in HBeAg-negative patients. Gastroenterology, 133(5), 1466--1474.
[7]
de Martel, C., Maucort-Boulch, D., Plummer, M., Franceschi, S. (2015). World-wide relative contribution of hepatitis B and C viruses in hepatocellular carcinoma. Hepatology, 62(4), 1190--1200.
[8]
Farazi, P. A., DePinho, R. A. (2006). Hepatocellular carcinoma pathogenesis: from genes to environment. Nature Reviews Cancer, 6(9), 674.
[9]
Kay, A., Zoulim, F. (2007). Hepatitis B virus genetic variability and evolution. Virus research, 127(2), 164--176.
[10]
Kobayashi, M., Akuta, N., Suzuki, F., Hosaka, T., Sezaki, H., Kobayashi, M., ... Mineta, R. (2010). Influence of aminoacid polymorphism in the core protein on progression of liver disease in patients infected with hepatitis C virus genotype 1b. Journal of medical virology, 82(1), 41--48.
[11]
Lee, W. M. (1997). Hepatitis B virus infection. New England journal of medicine, 337(24), 1733--1745.
[12]
Orito, E., Mizokami, M., Sakugawa, H., Michitaka, K., Ishikawa, K., Ichida, T., ... Japan HBV Genotype Research Group. (2001). A case-control study for clinical and molecular biological differences between hepatitis B viruses of genotypes B and C. Hepatology, 33(1), 218--223.
[13]
Perz, J. F., Armstrong, G. L., Farrington, L. A., Hutin, Y. J., Bell, B. P. (2006). The contributions of hepatitis B virus and hepatitis C virus infections to cirrhosis and primary liver cancer worldwide. Journal of hepatology, 45(4), 529--538.
[14]
Shah, A. D., Bartlett, J. W., Carpenter, J., Nicholas, O., Hemingway, H. (2014). Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. American journal of epidemiology, 179(6), 764--774.
[15]
Sikkema-Raddatz, B., Johansson, L. F., de Boer, E. N., Almomani, R., Boven, L. G., van den Berg, M. P., ... Sinke, R. J. (2013). Targeted next-generation sequencing can replace Sanger sequencing in clinical diagnostics. Human mutation, 34(7), 1035--1042.
[16]
Tajiri, K., Shimizu, Y. (2018). Branched-chain amino acids in liver diseases. Translational gastroenterology and hepatology, 3.
[17]
Tong, S., Revill, P. (2016). Overview of hepatitis B viral replication and genetic variability. Journal of hepatology, 64(1), S4-S16.
[18]
Wang, Y. M., Ng, W. C., Lo, S. K. (1999). Detection of preS/S gene mutants in chronic hepatitis B carriers with concurrent hepatitis B surface antibody and hepatitis B surface antigen. Journal of gastroenterology, 34(5), 600--606.
[19]
Xie, J. X., Zhao, J., Yin, J. H., Zhang, Q., Pu, R., Lu, W. Y., ... Cao, G. W. (2010). Association of novel mutations and heplotypes in the preS region of hepatitis B virus with hepatocellular carcinoma. Frontiers of medicine in China, 4(4), 419--429.
[20]
Yan, Y. P., Su, H. X., Ji, Z. H., Shao, Z. J., Pu, Z. S. (2014). Epidemiology of hepatitis B virus infection in China: current status and challenges. Journal of clinical and translational hepatology, 2(1), 15.

Cited By

View all
  • (2024)Identifying Proteins and Amino Acids Associated with Liver Cancer Risk: A Study Utilizing Mendelian Randomization and Bulk RNA Sequencing AnalysisJournal of Personalized Medicine10.3390/jpm1403026214:3(262)Online publication date: 28-Feb-2024
  • (2024)A Machine Learning‐Based Framework for Accurate and Early Diagnosis of Liver Diseases: A Comprehensive Study on Feature Selection, Data Imbalance, and Algorithmic PerformanceInternational Journal of Intelligent Systems10.1155/2024/61113122024:1Online publication date: 28-Jun-2024
  • (2021)Machine Learning‐Based Virus Type Classification Using Transmission Electron Microscopy Virus ImagesMachine Vision Inspection Systems, Volume 210.1002/9781119786122.ch1(1-22)Online publication date: 15-Jan-2021

Index Terms

  1. Identification of the Association between Hepatitis B Virus and Liver Cancer using Machine Learning Approaches based on Amino Acid

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICBBB '20: Proceedings of the 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics
    January 2020
    160 pages
    ISBN:9781450376761
    DOI:10.1145/3386052
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    In-Cooperation

    • Natl University of Singapore: National University of Singapore
    • RIED, Tokai Univ., Japan: RIED, Tokai University, Japan

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Classification
    2. Machine learning
    3. Word pattern frequency

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICBBB '20

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Identifying Proteins and Amino Acids Associated with Liver Cancer Risk: A Study Utilizing Mendelian Randomization and Bulk RNA Sequencing AnalysisJournal of Personalized Medicine10.3390/jpm1403026214:3(262)Online publication date: 28-Feb-2024
    • (2024)A Machine Learning‐Based Framework for Accurate and Early Diagnosis of Liver Diseases: A Comprehensive Study on Feature Selection, Data Imbalance, and Algorithmic PerformanceInternational Journal of Intelligent Systems10.1155/2024/61113122024:1Online publication date: 28-Jun-2024
    • (2021)Machine Learning‐Based Virus Type Classification Using Transmission Electron Microscopy Virus ImagesMachine Vision Inspection Systems, Volume 210.1002/9781119786122.ch1(1-22)Online publication date: 15-Jan-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media