[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Free access
Just Accepted

Protein Captioning: Bridging the Gap between Protein Sequences and Natural Languages

Online AM: 21 November 2024 Publication History

Abstract

We introduce the multimodal task of Protein Captioning, which is an easy-to-understand and flexible way for protein analysis. Compared to specific protein recognition or classification tasks, such as enzyme reaction classification and gene ontology term prediction, protein captioning provides comprehensive textural descriptions for proteins, thus playing a key role in bridging the gap between protein sequences and natural languages. To address the problem, we propose a simple yet effective method, Protein-to-Text Generative Pre-trained Transformer (P2T-GPT), to fuse multimodal embeddings and translate the chain of amino acid residues in a protein to a sequence of natural language words, i.e., text. For the evaluation of protein captioning, we collect the ProteinCap dataset that contains 94,454 protein-text pairs. Experiments on ProteinCap demonstrate the effectiveness of the proposed P2T-GPT on protein captioning. For example, our method obtains improvements of 8.74, 10.03, and 11.05 in the BERTScore compared to the baseline model on ProteinCap-\(\alpha,\beta,\gamma\), respectively. As minor contributions, first, P2T-GPT provides a way to connect protein science and Large Language Models (LLMs). By appending ChatGPT, our method can interact in a conversational way to answer questions given a protein. Second, we show that protein captioning can be treated as a pre-trained task that can benefit a range of downstream tasks, to a certain extent.

References

[1]
Patrick A Alexander, Yanan He, Yihong Chen, John Orban, and Philip N Bryan. 2009. A minimal sequence code for switching protein structure and function. Proceedings of the National Academy of Sciences 106, 50 (2009), 21149–21154.
[2]
Ethan C Alley, Grigory Khimulya, Surojit Biswas, Mohammed AlQuraishi, and George M Church. 2019. Unified rational protein engineering with sequence-based deep representation learning. Nature methods 16, 12 (2019), 1315–1322.
[3]
Afshine Amidi, Shervine Amidi, Dimitrios Vlachakis, Vasileios Megalooikonomou, Nikos Paragios, and Evangelia I Zacharaki. 2018. EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation. PeerJ 6 (2018), e4750.
[4]
Xiao-Chen Bai, Greg McMullan, and Sjors HW Scheres. 2015. How cryo-EM is revolutionizing structural biology. Trends in biochemical sciences 40, 1 (2015), 49–57.
[5]
Federico Baldassarre, David Menéndez Hurtado, Arne Elofsson, and Hossein Azizpour. 2021. GraphQA: protein model quality assessment using graph convolutional networks. Bioinform. 37, 3 (2021), 360–366.
[6]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare Voss (Eds.). Association for Computational Linguistics, Ann Arbor, Michigan, 65–72. https://aclanthology.org/W05-0909
[7]
Tristan Bepler and Bonnie Berger. 2019. Learning protein sequence embeddings using information from structure. arXiv preprint arXiv:1902.08661 (2019).
[8]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS) (2020).
[9]
Can Chen, Jingbo Zhou, Fan Wang, Xue Liu, and Dejing Dou. 2023. Structure-aware protein self-supervised learning. Bioinformatics (2023).
[10]
Tianlong Chen, Chengyue Gong, Daniel Jesus Diaz, Xuxi Chen, Jordan Tyler Wells, qiang liu, Zhangyang Wang, Andrew Ellington, Alex Dimakis, and Adam Klivans. 2023. HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing. In International Conference on Learning Representations (ICLR).
[11]
Xinlei Chen and C. Lawrence Zitnick. 2015. Mind's eye: A recurrent visual representation for image caption generation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
[12]
Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv: 1412.3555 (2014).
[13]
Bo Dai, Deming Ye, and Dahua Lin. 2018. Rethinking the Form of Latent States in Image Captioning. In ECCV.
[14]
Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, and Steven Hoi. 2023. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. arXiv preprint arXiv:2305.06500 (2023).
[15]
Georgy Derevyanko, Sergei Grudinin, Yoshua Bengio, and Guillaume Lamoureux. 2018. Deep convolutional networks for quality assessment of protein folds. Bioinform. 34, 23 (2018), 4046–4053.
[16]
Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell, and Kate Saenko. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
[17]
Shanshan Dong, Tianzi Niu, Xin Luo, Wu Liu, and Xinshun Xu. 2023. Semantic Embedding Guided Attention with Explicit Visual Feature Fusion for Video Captioning. ACM Trans. Multimedia Comput. Commun. Appl. 19, 2, Article 68 (feb 2023), 18 pages. https://doi.org/10.1145/3550276
[18]
Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, and Baining Guo. 2022. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. 2023. PaLM-E: An Embodied Multimodal Language Model. arXiv preprint arXiv:2303.03378 (2023).
[20]
Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rehawi, Wang Yu, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, and Burkhard Rost. 2021. ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021).
[21]
Hehe Fan, Zhangyang Wang, Yi Yang, and Mohan Kankanhalli. 2023. Continuous-Discrete Convolution for Geometry-Sequence Modeling in Proteins. In International Conference on Learning Representations (ICLR).
[22]
Hehe Fan, Linchao Zhu, Yi Yang, and Mohan Kankanhalli. 2023. PointListNet: Deep Learning on 3D Point Lists. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
[23]
Hao Fang, Saurabh Gupta, Forrest N. Iandola, Rupesh Kumar Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, and Geoffrey Zweig. 2015. From captions to visual concepts and back. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
[24]
Vladimir Gligorijević, P Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C Taylor, Ian M Fisk, Hera Vlamakis, et al. 2021. Structure-based protein function prediction using graph convolutional networks. Nature communications 12, 1 (2021), 1–14.
[25]
Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2020. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. arXiv:arXiv preprint arXiv::2007.15779
[26]
Longteng Guo, Jing Liu, Xinxin Zhu, Peng Yao, Shichen Lu, and Hanqing Lu. 2020. Normalized and Geometry-Aware Self-Attention Network for Image Captioning. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
[27]
Chen He and Haifeng Hu. 2019. Image Captioning With Visual-Semantic Double Attention. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1, Article 26 (jan 2019), 16 pages. https://doi.org/10.1145/3292058
[28]
Pedro Hermosilla and Timo Ropinski. 2022. Contrastive Representation Learning for 3D Protein Structures. arXiv preprint arXiv: 2205.15675 (2022).
[29]
Pedro Hermosilla, Marco Schäfer, Matej Lang, Gloria Fackelmann, Pere-Pau Vázquez, Barbora Kozlíková, Michael Krone, Tobias Ritschel, and Timo Ropinski. 2021. Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein Structures. In International Conference on Learning Representations (ICLR).
[30]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (1997), 1735–1780.
[31]
Jie Hou, Badri Adhikari, and Jianlin Cheng. 2018. DeepSF: Deep Convolutional Neural Network for Mapping Protein Sequences to Folds. In ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics.
[32]
Lun Huang, Wenmin Wang, Jie Chen, and Xiaoyong Wei. 2019. Attention on Attention for Image Captioning. In Proceedings of the International Conference on Computer Vision (ICCV).
[33]
Mariusz Jaskolski, Zbigniew Dauter, and Alexander Wlodawer. 2014. A brief history of macromolecular crystallography, illustrated by a family tree and its N obel fruits. The FEBS journal 281, 18 (2014), 3985–4009.
[34]
Xu Jia, Efstratios Gavves, Basura Fernando, and Tinne Tuytelaars. 2015. Guiding the Long-Short Term Memory Model for Image Caption Generation. In Proceedings of the International Conference on Computer Vision (ICCV).
[35]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR).
[36]
Maxat Kulmanov and Robert Hoehndorf. 2021. DeepGOPlus: improved protein function prediction from sequence. Bioinform. 37, 8 (2021), 1187.
[37]
Maxat Kulmanov, Mohammad Asif Khan, and Robert Hoehndorf. 2018. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinform. 34, 4 (2018), 660–668.
[38]
Guang Li, Linchao Zhu, Ping Liu, and Yi Yang. 2019. Entangled transformer for image captioning. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR). 8928–8937.
[39]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).
[40]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning. 12888–12900.
[41]
Kunchang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, and Yu Qiao. 2023. VideoChat: Chat-Centric Video Understanding. arXiv preprint arXiv:2305.06355 (2023).
[42]
Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
[43]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV). 740–755.
[44]
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, et al. 2023. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science (2023).
[45]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
[46]
Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations (ICLR).
[47]
Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
[48]
Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, and Alan L. Yuille. 2015. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN). In International Conference on Learning Representations (ICLR).
[49]
Alexey G Murzin, Steven E Brenner, Tim Hubbard, and Cyrus Chothia. 1995. SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of molecular biology 247, 4 (1995), 536–540.
[50]
Tian-Zi Niu, Zhen-Duo Chen, Xin Luo, Peng-Fei Zhang, Zi Huang, and Xin-Shun Xu. 2023. Video Captioning by Learning from Global Sentence and Looking Ahead. ACM Trans. Multimedia Comput. Commun. Appl. 19, 5s, Article 171 (jun 2023), 20 pages. https://doi.org/10.1145/3587252
[51]
OpenAI. 2022. Introducing chatgpt. (2022).
[52]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
[53]
Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, and Yi Yang. 2024. Clustering for Protein Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 319–329.
[54]
MD Abdur Rahman, M. Shamim Hossain, Nabil A. Alrajeh, and B. B. Gupta. 2021. A Multimodal, Multimedia Point-of-Care Deep Learning Framework for COVID-19 Diagnosis. ACM Trans. Multimedia Comput. Commun. Appl. 17, 1s, Article 18 (mar 2021), 24 pages. https://doi.org/10.1145/3421725
[55]
Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John F. Canny, Pieter Abbeel, and Yun S. Song. 2019. Evaluating Protein Transfer Learning with TAPE. In Advances in Neural Information Processing Systems (NeurIPS).
[56]
Roshan Rao, Joshua Meier, Tom Sercu, Sergey Ovchinnikov, and Alexander Rives. 2021. Transformer protein language models are unsupervised structure learners. In International Conference on Learning Representations (ICLR).
[57]
Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. 2021. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 118, 15 (2021), e2016239118. https://doi.org/10.1073/pnas.2016239118
[58]
David Sehnal, Sebastian Bittrich, Mandar Deshpande, Radka Svobodová, Karel Berka, Václav Bazgier, Sameer Velankar, Stephen K Burley, Jaroslav Koča, and Alexander S Rose. 2021. Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures. Nucleic Acids Research (2021).
[59]
Amir Shanehsazzadeh, David Belanger, and David Dohan. 2020. Is Transfer Learning Necessary for Protein Landscape Prediction? arXiv preprint arXiv: 2011.03443 (2020).
[60]
Nils Strodthoff, Patrick Wagner, Markus Wenzel, and Wojciech Samek. 2020. UDSMProt: universal deep sequence models for protein classification. Bioinform. 36, 8 (2020), 2401–2409.
[61]
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. 2024. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing 568 (2024), 127063.
[62]
Michael C Thompson, Todd O Yeates, and Jose A Rodriguez. 2020. Advances in methods for atomic resolution macromolecular structure determination. F1000Research 9 (2020).
[63]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
[64]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS).
[65]
Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
[66]
Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, richard socher, and Nazneen Rajani. 2021. {BERT}ology Meets Biology: Interpreting Attention in Protein Language Models. In International Conference on Learning Representations (ICLR).
[67]
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
[68]
Anqi Wang, Haifeng Hu, and Liang Yang. 2018. Image Captioning with Affective Guiding and Selective Attention. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3, Article 73 (jul 2018), 15 pages. https://doi.org/10.1145/3226037
[69]
Chao Wang, Hehe Fan, Ruijie Quan, and Yi Yang. 2024. Protchatgpt: Towards understanding proteins with large language models. arXiv preprint arXiv:2402.09649 (2024).
[70]
Limei Wang, Haoran Liu, Yi Liu, Jerry Kurtin, and Shuiwang Ji. 2023. Learning Hierarchical Protein Representations via Complete 3D Graph Networks. In International Conference on Learning Representations (ICLR).
[71]
Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, et al. 2023. Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. arXiv preprint arXiv:2305.11175 (2023).
[72]
Zeyuan Wang, Qiang Zhang, Shuang-Wei HU, Haoran Yu, Xurui Jin, Zhichen Gong, and Huajun Chen. 2023. Multi-level Protein Structure Pre-training via Prompt Learning. In International Conference on Learning Representations (ICLR).
[73]
Edwin C Webb. 1992. Enzyme nomenclature 1992. Number Ed. 6. Academic Press.
[74]
Jie Wu, Haifeng Hu, and Yi Wu. 2018. Image Captioning via Semantic Guidance Attention and Consensus Selection Strategy. ACM Trans. Multimedia Comput. Commun. Appl. 14, 4, Article 87 (oct 2018), 19 pages. https://doi.org/10.1145/3271485
[75]
Kurt Wüthrich. 2001. The way to NMR structures of proteins. Nature structural biology 8, 11 (2001), 923–925.
[76]
Kevin K Yang, Alex Xijie Lu, and Nicolo Fusi. 2022. Convolutions are competitive with transformers for protein sequence pretraining. In International Conference on Learning Representations (ICLR) Machine Learning for Drug Discovery Workshop.
[77]
Liang Yang, Haifeng Hu, Songlong Xing, and Xinlong Lu. 2020. Constrained LSTM and Residual Attention for Image Captioning. ACM Trans. Multimedia Comput. Commun. Appl. 16, 3, Article 75 (jul 2020), 18 pages. https://doi.org/10.1145/3386725
[78]
Xiaoshan Yang and Changsheng Xu. 2019. Image Captioning by Asking Questions. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s, Article 55 (jul 2019), 19 pages. https://doi.org/10.1145/3313873
[79]
Xu Yang, Hanwang Zhang, and Jianfei Cai. 2019. Learning to Collocate Neural Modules for Image Captioning. In Proceedings of the International Conference on Computer Vision (ICCV).
[80]
Zhilin Yang, Ye Yuan, Yuexin Wu, William W. Cohen, and Ruslan Salakhutdinov. 2016. Review Networks for Caption Generation. In Advances in Neural Information Processing Systems (NeurIPS).
[81]
Jin Yuan, Lei Zhang, Songrui Guo, Yi Xiao, and Zhiyong Li. 2020. Image Captioning with a Joint Attention Mechanism by Visual Concept Samples. ACM Trans. Multimedia Comput. Commun. Appl. 16, 3, Article 83 (jul 2020), 22 pages. https://doi.org/10.1145/3394955
[82]
Hang Zhang, Xin Li, and Lidong Bing. 2023. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding. arXiv preprint arXiv:2306.02858 (2023).
[83]
Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Qiang Zhang, Jiazhang Lian, and Huajun Chen. 2022. OntoProtein: Protein Pretraining With Gene Ontology Embedding. In International Conference on Learning Representations (ICLR).
[84]
Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, and Jianfeng Gao. 2021. VinVL: Revisiting Visual Representations in Vision-Language Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5579–5588.
[85]
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019).
[86]
Zuobai Zhang, Minghao Xu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, and Jian Tang. 2023. Enhancing protein language models with structure-based encoder and pre-training. arXiv preprint arXiv:2303.06275 (2023).
[87]
Zuobai Zhang, Minghao Xu, Arian R. Jamasb, Vijil Chenthamarakshan, Aurélie C. Lozano, Payel Das, and Jian Tang. 2022. Protein Representation Learning by Geometric Structure Pretraining. arXiv preprint arXiv: 2203.06125 (2022).
[88]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2024. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems 36 (2024).
[89]
Hong-Yu Zhou, Yunxiang Fu, Zhicheng Zhang, Cheng Bian, and Yizhou Yu. 2023. Protein Representation Learning via Knowledge Enhanced Primary Structure Modeling. In International Conference on Learning Representations (ICLR).
[90]
Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. 2023. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592 (2023).

Index Terms

  1. Protein Captioning: Bridging the Gap between Protein Sequences and Natural Languages

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications Just Accepted
      EISSN:1551-6865
      Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Online AM: 21 November 2024
      Accepted: 03 November 2024
      Revised: 16 September 2024
      Received: 11 June 2024

      Check for updates

      Author Tags

      1. Protein captioning
      2. Natural language processing
      3. Multimodal learning

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 48
        Total Downloads
      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)48
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media