[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

From pre-training to fine-tuning: : An in-depth analysis of Large Language Models in the biomedical domain

Published: 01 November 2024 Publication History

Abstract

In this study, we delve into the adaptation and effectiveness of Transformer-based, pre-trained Large Language Models (LLMs) within the biomedical domain, a field that poses unique challenges due to its complexity and the specialized nature of its data. Building on the foundation laid by the transformative architecture of Transformers, we investigate the nuanced dynamics of LLMs through a multifaceted lens, focusing on two domain-specific tasks, i.e., Natural Language Inference (NLI) and Named Entity Recognition (NER). Our objective is to bridge the knowledge gap regarding how these models’ downstream performances correlate with their capacity to encapsulate task-relevant information. To achieve this goal, we probed and analyzed the inner encoding and attention mechanisms in LLMs, both encoder- and decoder-based, tailored for either general or biomedical-specific applications. This examination occurs before and after the models are fine-tuned across various data volumes. Our findings reveal that the models’ downstream effectiveness is intricately linked to specific patterns within their internal mechanisms, shedding light on the nuanced ways in which LLMs process and apply knowledge in the biomedical context. The source code for this paper is available at https://github.com/agnesebonfigli99/LLMs-in-the-Biomedical-Domain.

Highlights

Comparison between encoder/decoder LLMs and their domain-adapted versions.
Assessment of the impact of different data volumes on Fine-Tuning.
Probing and analysis of LLMs’ internal representations and attention mechanisms.
Identification of key internal patterns linked to LLMs’ performances.

References

[1]
Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, Polosukhin Illia, Attention is all you need, in: Guyon I., Luxburg U. Von, Bengio S., Wallach H., Fergus R., Vishwanathan S., Garnett R. (Eds.), Advances in neural information processing systems, Vol. 30, Curran Associates, Inc., 2017.
[2]
Wolf Thomas, Debut Lysandre, Sanh Victor, Chaumond Julien, Delangue Clement, Moi Anthony, Cistac Pierric, Rault Tim, Louf Rémi, Funtowicz Morgan, et al. Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations. 2020, p. 38–45.
[3]
Min Bonan, Ross Hayley, Sulem Elior, Veyseh Amir Pouran Ben, Nguyen Thien Huu, Sainz Oscar, Agirre Eneko, Heintz Ilana, Roth Dan, Recent advances in natural language processing via large pre-trained language models: A survey, ACM Comput Surv 56 (2) (2023) 1–40.
[4]
Rietzler Alexander, Stabinger Sebastian, Opitz Paul, Engl Stefan, Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect-target sentiment classification, 2019, arXiv preprint arXiv:1908.11860.
[5]
Tenney Ian, Xia Patrick, Chen Berlin, Wang Alex, Poliak Adam, McCoy R. Thomas, Kim Najoung, Van Durme Benjamin, Bowman Samuel R., Das Dipanjan, et al., What do you learn from context? probing for sentence structure in contextualized word representations, 2019, arXiv preprint arXiv:1905.06316.
[6]
Wang Haifeng, Li Jiwei, Wu Hua, Hovy Eduard, Sun Yu, Pre-trained language models and their applications, Engineering (2022).
[7]
Soni Sarvesh, Roberts Kirk. Evaluation of dataset selection for pre-training and fine-tuning transformer language models for clinical question answering. In: Proceedings of the twelfth language resources and evaluation conference. 2020, p. 5532–8.
[8]
Sung Mujeen, Lee Jinhyuk, Yi Sean, Jeon Minji, Kim Sungdong, Kang Jaewoo, Can language models be biomedical knowledge bases?, 2021, arXiv preprint arXiv:2109.07154.
[9]
Wang Benyou, Xie Qianqian, Pei Jiahuan, Chen Zhihong, Tiwari Prayag, Li Zhao, Fu Jie, Pre-trained language models in biomedical domain: A systematic survey, ACM Comput Surv 56 (3) (2023) 1–52.
[10]
Gu Yu, Tinn Robert, Cheng Hao, Lucas Michael, Usuyama Naoto, Liu Xiaodong, Naumann Tristan, Gao Jianfeng, Poon Hoifung, Domain-specific language model pretraining for biomedical natural language processing, ACM Trans Comput Healthc (HEALTH) 3 (1) (2021) 1–23.
[11]
Kung Tiffany H., Cheatham Morgan, Medenilla Arielle, Sillos Czarina, De Leon Lorie, Elepaño Camille, Madriaga Maria, Aggabao Rimel, Diaz-Candido Giezel, Maningo James, et al., Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models, PLoS Digit Health 2 (2) (2023).
[12]
Xu Haoran, Ebner Seth, Yarmohammadi Mahsa, White Aaron Steven, Van Durme Benjamin, Murray Kenton, Gradual fine-tuning for low-resource domain adaptation, 2021, arXiv preprint arXiv:2103.02205.
[13]
Zhao Haiyan, Chen Hanjie, Yang Fan, Liu Ninghao, Deng Huiqi, Cai Hengyi, Wang Shuaiqiang, Yin Dawei, Du Mengnan, Explainability for large language models: A survey, ACM Trans Intell Syst Technol (2023).
[14]
AlKhamissi Badr, Li Millicent, Celikyilmaz Asli, Diab Mona, Ghazvininejad Marjan, A review on language models as knowledge bases, 2022.
[15]
Kovaleva Olga, Romanov Alexey, Rogers Anna, Rumshisky Anna, Revealing the dark secrets of BERT, 2019, arXiv preprint arXiv:1908.08593.
[16]
Voita Elena, Serdyukov Pavel, Sennrich Rico, Titov Ivan, Context-aware neural machine translation learns anaphora resolution, 2018, arXiv preprint arXiv:1805.10163.
[17]
Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina, BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186.
[18]
Radford Alec, Wu Jeffrey, Child Rewon, Luan David, Amodei Dario, Sutskever Ilya, et al., Language models are unsupervised multitask learners, OpenAI Blog 1 (8) (2019) 9.
[19]
Lee Jinhyuk, Yoon Wonjin, Kim Sungdong, Kim Donghyeon, Kim Sunkyu, So Chan Ho, Kang Jaewoo, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics 36 (4) (2020) 1234–1240.
[20]
Luo Renqian, Sun Liai, Xia Yingce, Qin Tao, Zhang Sheng, Poon Hoifung, Liu Tie-Yan, BioGPT: generative pre-trained transformer for biomedical text generation and mining, Brief Bioinform 23 (6) (2022) bbac409.
[21]
Peters Matthew E., Neumann Mark, Logan IV Robert L., Schwartz Roy, Joshi Vidur, Singh Sameer, Smith Noah A., Knowledge enhanced contextual word representations, 2019, arXiv preprint arXiv:1909.04164.
[22]
Naveed Humza, Khan Asad Ullah, Qiu Shi, Saqib Muhammad, Anwar Saeed, Usman Muhammad, Barnes Nick, Mian Ajmal, A comprehensive overview of large language models, 2023, arXiv preprint arXiv:2307.06435.
[23]
Dong Li, Yang Nan, Wang Wenhui, Wei Furu, Liu Xiaodong, Wang Yu, Gao Jianfeng, Zhou Ming, Hon Hsiao-Wuen, Unified language model pre-training for natural language understanding and generation, Adv Neural Inf Process Syst 32 (2019).
[24]
Ding Ning, Qin Yujia, Yang Guang, Wei Fuchao, Yang Zonghan, Su Yusheng, Hu Shengding, Chen Yulin, Chan Chi-Min, Chen Weize, et al., Parameter-efficient fine-tuning of large-scale pre-trained language models, Nat Mach Intell 5 (3) (2023) 220–235.
[25]
Zhuang Fuzhen, Qi Zhiyuan, Duan Keyu, Xi Dongbo, Zhu Yongchun, Zhu Hengshu, Xiong Hui, He Qing, A comprehensive survey on transfer learning, Proc IEEE 109 (1) (2020) 43–76.
[26]
Dodge Jesse, Ilharco Gabriel, Schwartz Roy, Farhadi Ali, Hajishirzi Hannaneh, Smith Noah, Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping, 2020, arXiv preprint arXiv:2002.06305.
[27]
Ravichander Abhilasha, Belinkov Yonatan, Hovy Eduard, Probing the probing paradigm: Does probing accuracy entail task relevance?, 2020, arXiv preprint arXiv:2005.00719.
[28]
Petroni Fabio, Rocktäschel Tim, Lewis Patrick, Bakhtin Anton, Wu Yuxiang, Miller Alexander H., Riedel Sebastian, Language models as knowledge bases?, 2019, arXiv preprint arXiv:1909.01066.
[29]
Jawahar Ganesh, Sagot Benoît, Seddah Djamé. What does BERT learn about the structure of language?. In: ACL 2019-57th annual meeting of the association for computational linguistics. 2019.
[30]
Miaschi Alessio, Brunato Dominique, Dell’Orletta Felice, Venturi Giulia, Linguistic profiling of a neural language model, 2020, arXiv preprint arXiv:2010.01869.
[31]
Puccetti Giovanni, Miaschi Alessio, Dell’Orletta Felice. How Do BERT Embeddings Organize Linguistic Knowledge?. In: Proceedings of deep learning inside out (DeeLIO): the 2nd workshop on knowledge extraction and integration for deep learning architectures. 2021, p. 48–57.
[32]
Conneau Alexis, Khandelwal Kartikay, Goyal Naman, Chaudhary Vishrav, Wenzek Guillaume, Guzmán Francisco, Grave Edouard, Ott Myle, Zettlemoyer Luke, Stoyanov Veselin, Unsupervised cross-lingual representation learning at scale, 2019, arXiv preprint arXiv:1911.02116.
[33]
De Vries Wietse, van Cranenburgh Andreas, Nissim Malvina, What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models, 2020, arXiv preprint arXiv:2004.06499.
[34]
Miaschi Alessio, Sarti Gabriele, Brunato Dominique, Dell’Orletta Felice, Venturi Giulia, Probing linguistic knowledge in Italian neural language models across language varieties, IJCoL. Ital J Comput Linguist 8 (8–1) (2022).
[35]
Safavi Tara, Koutra Danai, Relational world knowledge representation in contextual language models: A review, 2021, arXiv preprint arXiv:2104.05837.
[36]
Huang Wenlong, Abbeel Pieter, Pathak Deepak, Mordatch Igor, Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, in: International conference on machine learning, PMLR, 2022, pp. 9118–9147.
[37]
Da Jeff, Bras Ronan Le, Lu Ximing, Choi Yejin, Bosselut Antoine, Analyzing commonsense emergence in few-shot knowledge models, 2021, arXiv preprint arXiv:2101.00297.
[38]
Caselli Tommaso, Dini Irene, Dell’Orletta Felice. How About Time? Probing a Multilingual Language Model for Temporal Relations. In: Proceedings of the 29th international conference on computational linguistics. 2022, p. 3197–209.
[39]
Jin Qiao, Dhingra Bhuwan, Cohen William W., Lu Xinghua, Probing biomedical embeddings from language models, 2019, arXiv preprint arXiv:1904.02181.
[40]
Zhu Qi, Gu Yuxian, Luo Lingxiao, Li Bing, Li Cheng, Peng Wei, Huang Minlie, Zhu Xiaoyan, When does further pre-training MLM help? An empirical study on task-oriented dialog pre-training, in: Proceedings of the second workshop on insights from negative results in NLP, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 54–61.
[41]
Tinn Robert, Cheng Hao, Gu Yu, Usuyama Naoto, Liu Xiaodong, Naumann Tristan, Gao Jianfeng, Poon Hoifung, Fine-tuning large neural language models for biomedical natural language processing, Patterns 4 (4) (2023).
[42]
Clark Kevin, Khandelwal Urvashi, Levy Omer, Manning Christopher D., What does bert look at? an analysis of bert’s attention, 2019, arXiv preprint arXiv:1906.04341.
[43]
Jo Jae-young, Myaeng Sung-Hyon. Roles and utilization of attention heads in transformer-based neural language models. In: Proceedings of the 58th annual meeting of the association for computational linguistics. 2020, p. 3404–17.
[44]
Sree Harsha Sai, Krishna Swaroop K., Chandavarkar B.R., Natural language inference: Detecting contradiction and entailment in multilingual text, in: International conference on information processing, Springer, 2021, pp. 314–327.
[45]
Mahendra Rahmad, Aji Alham Fikri, Louvan Samuel, Rahman Fahrurrozi, Vania Clara, Indonli: A natural language inference dataset for indonesian, 2021, arXiv preprint arXiv:2110.14566.
[46]
Richardson Kyle, Hu Hai, Moss Lawrence, Sabharwal Ashish. Probing natural language inference models through semantic fragments. In: Proceedings of the AAAI conference on artificial intelligence. Vol. 34, 2020, p. 8713–21.
[47]
Shivade Chaitanya, MedNLI - A natural language inference dataset for the clinical domain, 2019,.
[48]
Herlihy Christine, Rudinger Rachel, MedNLI is not immune: Natural language inference artifacts in the clinical domain, 2021, arXiv preprint arXiv:2106.01491.
[49]
Johnson Alistair, et al., MIMIC-III clinical database, 2016,.
[50]
Goldberger A., et al., PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals, Circulation 101 (23) (2000) e215–e220. [Online].
[51]
Marrero Mónica, Urbano Julián, Sánchez-Cuadrado Sonia, Morato Jorge, Gómez-Berbís Juan Miguel, Named entity recognition: fallacies, challenges and opportunities, Comput Stand Interfaces 35 (5) (2013) 482–489.
[52]
Sang Erik F., Buchholz Sabine, Introduction to the CoNLL-2000 shared task: Chunking, 2000, arXiv preprint cs/0009008.
[53]
Collier Nigel, Kim Jin-Dong. Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. NLPBA/BioNLP, 2004, p. 73–8.
[54]
Li Xiaoya, Feng Jingrong, Meng Yuxian, Han Qinghong, Wu Fei, Li Jiwei, A unified MRC framework for named entity recognition, 2019, arXiv preprint arXiv:1910.11476.
[55]
Mao Yuren, Hao Yu, Liu Weiwei, Lin Xue, Cao Xin, Class-imbalanced-aware distantly supervised named entity recognition, IEEE Trans Neural Netw Learn Syst (2023) 1–13.
[56]
Kalyan Katikapalli Subramanyam, Rajasekharan Ajit, Sangeetha Sivanesan, Ammus: A survey of transformer-based pretrained models in natural language processing, 2021, arXiv preprint arXiv:2108.05542.
[57]
Deepa Ms. D., et al., Bidirectional encoder representations from transformers (BERT) language model for sentiment analysis task, Turk J Comput Math Educ (TURCOMAT) 12 (7) (2021) 1708–1721.
[58]
Alammar Jay, The illustrated GPT-2 (visualizing transformer language models), 2019, Accessed: [Inserisci qui la data di accesso].
[59]
Belinkov Yonatan, Probing classifiers: Promises, shortcomings, and advances, Comput Linguist 48 (1) (2022) 207–219.
[60]
Sakoe Hiroaki, Chiba Seibi, Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans Acoust Speech Signal Process 26 (1) (1978) 43–49.
[61]
Keogh Eamonn J, Pazzani Michael J. Scaling up dynamic time warping for datamining applications. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 2000, p. 285–9.
[62]
Müller Meinard, Dynamic time warping, in: Information retrieval for music and motion, Springer, 2007, pp. 69–84.
[63]
Matuschek Michael, Schlüter Tim, Conrad Stefan. Measuring text similarity with dynamic time warping. In: Proceedings of the 2008 international symposium on Database engineering & applications. 2008, p. 263–7.
[64]
Hu Hai, Zhou He, Tian Zuoyu, Zhang Yiwen, Ma Yina, Li Yanting, Nie Yixin, Richardson Kyle, Investigating transfer learning in multilingual pre-trained language models through Chinese natural language inference, 2021, arXiv preprint arXiv:2106.03983.
[65]
Romanov Alexey, Shivade Chaitanya, Lessons from natural language inference in the clinical domain, 2018, arXiv preprint arXiv:1808.06752.
[66]
Merchant Amil, Rahimtoroghi Elahe, Pavlick Ellie, Tenney Ian, What happens to bert embeddings during fine-tuning?, 2020, arXiv preprint arXiv:2004.14448.
[67]
Durrani Nadir, Sajjad Hassan, Dalvi Fahim, How transfer learning impacts linguistic knowledge in deep NLP models?, 2021, arXiv preprint arXiv:2105.15179.

Index Terms

  1. From pre-training to fine-tuning: An in-depth analysis of Large Language Models in the biomedical domain
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Artificial Intelligence in Medicine
          Artificial Intelligence in Medicine  Volume 157, Issue C
          Nov 2024
          404 pages

          Publisher

          Elsevier Science Publishers Ltd.

          United Kingdom

          Publication History

          Published: 01 November 2024

          Author Tags

          1. Large Language Models
          2. Biomedical domain
          3. Domain adaptation
          4. Probing tasks
          5. BERT
          6. GPT

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 04 Feb 2025

          Other Metrics

          Citations

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media