Abstract
To comprehend DNA-protein binding specificity in diverse cell types is essential to reveal regulatory mechanisms in biological processes. Recently, deep learning has been successfully applied to predict DNA-protein binding sites from large-scale chromatin-profiling data. However, the precise identification of putative binding sites in specific cell types with low labeled samples remains challenging. To this end, we present a novel Transferable Transformer-based method, dubbed as BindTransNet, for cross-cell types DNA-protein binding prediction. Transfer learning and Transformer Encoder are simultaneously adopted in our presented approach to capture some shared long-range dependencies between various motifs available in cross-cell types. This unique design helps our method recognize putative binding sites without massive labeled samples by leveraging the above-mentioned standard features. This work is the first to apply a Transformer for DNA-protein binding sites prediction. The presented method is measured on TFs COREST and SRF in four cell types with eight cell-type TF pairs. For both 4-class prediction and binary-level prediction, BindTransNet can significantly outperform several state-of-the-art methods. Moreover, BindTransNet achieves considerable margin performance improvements by leveraging transfer learning. This is a presuasive indication that BindTransNet can indeed capture shared features available in other cell types.
This work is supported by the National Natural Science Foundation of China under Grant No. 61702058; the China Postdoctoral Science Foundation funded project No. 2017M612948.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Samuel, L., Arttu, J., Laura, C., et al.: The human transcription factors. Cell 172(4), 650–665 (2018)
Matthew, S., Tianyin, Z., Lin, Y., et al.: Absence of a simple code: how transcription factors read the genome. Trends in biochemical sciences 39(9), 381–399 (2014)
Anthony, M., Beibei, X., Tsu-Pei, C., et al.: Dna shape features improve transcription factor binding site predictions in vivo. Cell systems 3(3), 278–286 (2016)
Stormo, G.: Modeling the specificity of protein-dna interactions. Quantitative biology 1(2), 115–130 (2013)
Yu, L., Chao, H., Lizhong, D., et al.: Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods 166, 4–21 (2019)
Yongqing, Z., Shaojie, Q., Shengjie, J., et al.: Identification of dna-protein binding sites by bootstrap multiple convolutional neural networks on sequence information. Engineering Applications of Artificial Intelligence 79, 58–66 (2019)
Yongqing, Z., Shaojie, Q., Shengjie, J., et al.: Deepsite: bidirectional lstm and cnn models for predicting dna-protein binding. International Journal of Machine Learning and Cybernetics 11(4), 841–851 (2020)
Yongqing, Z., Jianrong, Y., Siyu, C., et al.: Review of the applications of deep learning in bioinformatics. Current Bioinformatics 15(8), 898–911 (2020)
Yongqing, Z., Shaojie, Q., Yuanqi, Z., et al.: Cae-cnn: Predicting transcription factor binding site with convolutional autoencoder and convolutional neural network. Expert Systems with Applications 183, 115404 (2021)
Babak, A., Andrew, D., Matthew, W., et al.: Predicting the sequence specificities of dna-and rna-binding proteins by deep learning. Nature biotechnology 33(8), 831–838 (2015)
Jian, Z., Olga, T.: Predicting effects of noncoding variants with deep learning-based sequence model. Nature methods 12(10), 931–934 (2015)
Daniel, Q., Xiaohui, X.: Danq: a hybrid convolutional and recurrent deep neural network for quantifying the function of dna sequences. Nucleic acids research 44(11), e107–e107 (2016)
Deng, L., Wu, H., Liu, X., et al.: Deepd2v: A novel deep learning-based framework for predicting transcription factor binding sites from combined dna sequence. International journal of molecular sciences 22(11), 5521 (2021)
Qinhu, Z., Lin, Z., Wenzheng, B., et al.: Weakly-supervised convolutional neural network architecture for predicting protein-dna binding. IEEE/ACM transactions on computational biology and bioinformatics 17(2), 679–689 (2018)
Fang, J., Shaowu, Z., Zhen, C., et al.: An integrative framework for combining sequence and epigenomic data to predict transcription factor binding sites using deep learning. IEEE/ACM transactions on computational biology and bioinformatics (2019)
Sirajul, S., Jianqiu, Z., Yufei, H.: Base-pair resolution detection of transcription factor binding site by deep deconvolutional network. Bioinformatics 34(20), 3446–3453 (2018)
Zhou, J., Lu, Q., Gui, L., et al.: Mttfsite: cross-cell type tf binding site prediction by using multi-task learning. Bioinformatics 35(24), 5067–5077 (2019)
Park, S., Koh, Y., Jeon, H., et al.: Enhancing the interpretability of transcription factor binding site prediction using attention mechanism. Scientific reports 10(1), 1–10 (2020)
Hongjie, W., Chengyuan, C., Xiaoyan, X., et al.: Unified deep learning architecture for modeling biology sequence. IEEE/ACM transactions on computational biology and bioinformatics 15(5), 1445–1452 (2017)
Ashish, V., Noam, S., Niki, P., et al.: Attention is all you need. In: Advances in neural information processing systems. pp. 5998–6008 (2017)
Jialin, P.S., Qiang, Y.: A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22(10), 1345–1359 (2009)
Haoyang, Z., Matthew, E., Ge, L.: other: Convolutional neural network architectures for predicting dna-protein binding. Bioinformatics 32(12), i121–i127 (2016)
Yuanqi, Z., Meiqin, G., Meng, L., et al.: A review about transcription factor binding sites prediction based on deep learning. IEEE Access 8, 219256–219274 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Z. et al. (2021). BindTransNet: A Transferable Transformer-Based Architecture for Cross-Cell Type DNA-Protein Binding Sites Prediction. In: Wei, Y., Li, M., Skums, P., Cai, Z. (eds) Bioinformatics Research and Applications. ISBRA 2021. Lecture Notes in Computer Science(), vol 13064. Springer, Cham. https://doi.org/10.1007/978-3-030-91415-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-91415-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91414-1
Online ISBN: 978-3-030-91415-8
eBook Packages: Computer ScienceComputer Science (R0)