[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

gaBERT: An Interpretable Pretrained Deep Learning Framework for Cancer Gene Marker Discovery

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 14954))

Included in the following conference series:

  • 599 Accesses

Abstract

Cancer classification based on gene expression profiles and the identification of significant oncogenes have become hot research topics in the field of bioinformatics. However, the complexity, high-dimensionality and limited sample size of gene expression data make comprehensive analysis challenging. Deep learning networks have achieved great success in addressing such issues. However, neural network models are mostly considered as “black box” approaches, and their interpretability has always been a bottleneck. We propose an interpretable pre-trained deep learning framework for identifying genetic markers, named gaBERT. This approach, leveraging BERT's pre-training and fine-tuning methodology, gains a general understanding of gene interaction patterns through pre-training on a vast amount of unlabeled gene expression data; it then transfers this knowledge to new cancer disease expression data for supervised fine-tuning, ultimately producing a list of genes that significantly contribute to specific disease phenotypes. Experiments demonstrate gaBERT's good performance in cancer prediction, tumor-related gene identification, and model interpretability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 69.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 69.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Segal, N.H., Pavlidis, P., Noble, W.S.: Classification of clear-cell sarcoma as a subtype of melanoma by genomic profiling. J. Clin. Oncol. 21(9), 1775–1781 (2003)

    Article  Google Scholar 

  2. Ram, M., Najafi, A., Shakeri, M.T.: Classification and biomarker genes selection for cancer gene expression data using random forest. Iran. J. Pathol. 12(4), 339 (2017)

    Article  Google Scholar 

  3. Hijazi, H., Chan, C.: A classification framework applied to cancer gene expression profiles. J. Healthc. Eng. 4(2), 255–283 (2013)

    Article  Google Scholar 

  4. Xiao, Y., Wu, J., Lin, Z., Zhao, X.: A deep learning-based multi-model ensemble method for cancer prediction. Comput. Methods Programs Biomed. 153, 1–9 (2018)

    Article  Google Scholar 

  5. Zhou, Y., Graham, S., Alemi Koohbanani, N.: CGC-Net: cell graph convolutional network for grading of colorectal cancer histology images. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 388–398 (2019)

    Google Scholar 

  6. Vaswani, A., Shazeer, N., Parmar, N.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  7. Devlin, J., Chang, M. W., Lee, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  8. Yang, F., Wang, W., Wang, F.: ScBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4(10), 852–866 (2022)

    Article  Google Scholar 

  9. Lim, S.B., Tan, S.J., Lim, W.T.: Compendiums of cancer transcriptomes for machine learning applications. Sci. Data 6(1), 194 (2019)

    Article  Google Scholar 

  10. Boucher, B., Jenna, S.: Genetic interaction networks: better understand to better predict. Front. Genet. 4, 68624 (2013)

    Article  Google Scholar 

  11. Li, J., Zhou, D., Qiu, W.: Application of weighted gene co-expression network analysis for data from paired design. Sci. Rep. 8(1), 622 (2018)

    Article  Google Scholar 

  12. Choy, C.T., Wong, C.H.: Embedding of genes using cancer gene expression data: biological relevance and potential application on biomarker discovery. Front. Genet. 9, 421857 (2019)

    Article  Google Scholar 

  13. Mikolov, T., Chen, K., Corrado, G.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  14. Du, J., Jia, P., Dai, Y.: Gene2vec: distributed representation of genes based on co-expression. BMC Genomics 20, 7–15 (2019)

    Article  Google Scholar 

  15. Subramanian, A., Tamayo, P., Mootha, V.K.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102(43), 15545–15550 (2005)

    Article  Google Scholar 

  16. Choromanski, K., Likhosherstov, V., Dohan, D.: Rethinking attention with performers. arXiv preprint arXiv:2009.14794 (2020)

  17. Montavon, G., Binder, A., Lapuschkin, S.: Layer-wise relevance propagation: an overview. Explainable AI: interpreting, explaining and visualizing deep learning, 193–209 (2019)

    Google Scholar 

  18. Dennis, G., Sherman, B.T., Hosack, D.A.: DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 4, 1–11 (2003)

    Article  Google Scholar 

  19. Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)

    Article  Google Scholar 

  20. Hoque, M.O., Brait, M., Rosenbaum, E.: Genetic and epigenetic analysis of erbB signaling pathway genes in lung cancer. J. Thorac. Oncol. 5(12), 1887–1893 (2010)

    Article  Google Scholar 

  21. Yonezawa, M., Wada, K., Tatsuguchi, A.: Heregulin-induced VEGF expression via the ErbB3 signaling pathway in colon cancer. Digestion 80(4), 215–225 (2009)

    Article  Google Scholar 

  22. Wang, M., Ren, D., Guo, W.: N-cadherin promotes epithelial-mesenchymal transition and cancer stem cell-like traits via ErbB signaling in prostate cancer cells. Int. J. Oncol. 48(2), 595–606 (2016)

    Article  Google Scholar 

  23. Liao, T., Wen, D., Ma, B.: Yes-associated protein 1 promotes papillary thyroid cancer cell proliferation by activating the ERK/MAPK signaling pathway. Oncotarget 8(7), 11719 (2017)

    Article  Google Scholar 

  24. Waldner, M.J., Neurath, M.F.: Targeting the VEGF signaling pathway in cancer therapy. Expert Opin. Ther. Targets 16(1), 5–13 (2012)

    Article  Google Scholar 

  25. Liu, F., Bu, Z., Zhao, F.: Increased T-helper 17 cell differentiation mediated by exosome-mediated micro RNA-451 redistribution in gastric cancer infiltrated T cells. Cancer Sci. 109(1), 65–73 (2018)

    Article  Google Scholar 

  26. Chen, M.: Platinum resistance in ovarian cancer: a molecular analysis of the p13k/akt pathway. Imperial College London 2(7991), 890–891 (2011)

    Google Scholar 

Download references

Acknowledgements

This groundbreaking research was made possible thanks to the support from both the Shenzhen Science and Technology Program (Grant No. SGDX20201103095603009) and the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDB38050100). We also extend our thanks to all individuals and institutions that have contributed to the success of this project through their expertise, resources, and guidance.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xinzhe Pang or Yunpeng Cai .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file2 (PDF 260 kb)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hou, J., Wang, Z., Lu, H., Pang, X., Cai, Y. (2024). gaBERT: An Interpretable Pretrained Deep Learning Framework for Cancer Gene Marker Discovery. In: Peng, W., Cai, Z., Skums, P. (eds) Bioinformatics Research and Applications. ISBRA 2024. Lecture Notes in Computer Science(), vol 14954. Springer, Singapore. https://doi.org/10.1007/978-981-97-5128-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5128-0_32

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5127-3

  • Online ISBN: 978-981-97-5128-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics