Abstract
Temporal information extraction from social media messages is of critical importance to several geographical applications. Combined with the characteristics of temporal information descriptions in Chinese text, different time expression patterns formed by time unit combinations are summarized. A deep learning-based information extraction algorithm (named BERT-BiLSTM-CRF) for automatically extracting temporal information from social media messages is proposed. Based on the bidirectional long short-term memory-conditional random field (BiLSTM-CRF) model, the BERT (bidirectional encoder representations from transformers) pretrained language model was used to enhance the generalization ability of the word vector model to capture long-range contextual information; then, the trained word vector was input into the BiLSTM-CRF model for further training. The proposed model was then evaluated on the constructed corpus, a set of manually annotated Chinese texts from social media messages. Among the basic models, the BERT-BiLSTM-CRF achieved the highest average F1-score of 85%. The experimental results show that the proposed method outperforms the current state-of-the-art models.
Similar content being viewed by others
References
Ahn D, Adafre F, De Rijke M (2005) Towards task-based temporal extraction and recognition. In: Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik
Alfattni G, Peek N, Nenadic G (2020) Extraction of temporal relations from clinical free text: a systematic review of current approaches. J Biomed Inform 108:103488. https://doi.org/10.1016/j.jbi.2020.103488
Amigó E, Artiles J, Li Q, Ji H (2021) An evaluation framework for aggregated temporal information extraction. In: SIGIR-2011 workshop on entity-oriented search
Chang Y-C, Dai H-J, Wu JC-Y, Chen J-M, Tsai RT-H, Hsu W-L(2013) TEMPTING system: A hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries. J Biomed Inform 46(6):S54–S62
Deepika SS, Tv G (2021)Pattern-based bootstrapping framework for biomedical relation extraction. Eng Appl Artif Intell 99:104130. https://doi.org/10.1016/j.engappai.2020.104130
Devlin J, Chang M, Lee K et al (2019) Bert: pre-training of deep bidirectional transformers for language understanding [C]. Proc of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, Stroudsburg, 4171-4186
Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211
Ferro L, Gerberl, Mani I et al. Tides 2005 standard for the annotation of temporal expressions [EB /OL]. (2005-09-10) [2019-05-27]. http://www.timex2.mitre.org
Ghahabi O, Hernando J (2018) Restricted boltzmann machines for vector representation of speech in speaker recognition. Comput Speech Lang 47:16–29
Giannella C, Winder R, Jubinski J (2019) Annotation projection for temporal information extraction. Nat Lang Eng 25:385–403. https://doi.org/10.1017/S1351324919000044
Jayapriya K, Jacob IJ, Darney PE (2020) Hyperspectral image classification using multi-task feature leverage with multi-variant deep learning. Earth Sci Inf 13(4):1093–1102
Jeong YS, Kim ZM, Do HW, Lim CG, Choi HJ (2015) Temporal information extraction from Korean texts. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pp 279-288
Kolomiyets O, Moens M-F(2010) KUL: Recognition and normalization of temporal expressions. SemEval@ACL, 325–328
Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, Forshee R, Walderhaug M, Botsis T (2017) Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J Biomed Inform 73:14–29
Leeuwenberg A, Moens M-F(2019) A survey on temporal reasoning for temporal information extraction from text. J Artif Intell Res 66:341–380. https://doi.org/10.1613/jair.1.11727
Li W, Wong K-F, Yuan C (2001) Toward automatic Chinese temporal information extraction. JASIST 52:748–762. https://doi.org/10.1002/asi.1126.abs
Li J, Tan H, Wang F (2012) Recognition of temporal expressions and their types in Chinese [J]. Comput Sci 39(S3):191–194211
Li Z, Li C, Long Yu, Wang X (2020) A system for automatically extracting clinical events with temporal information. BMC Med Inform Decis Mak 20. https://doi.org/10.1186/s12911-020-01208-9
Lin Y-K, Chen Hsiu-chin, Brown R (2013) MedTime: A temporal information extraction system for clinical narratives. J Biomed Inform 46. https://doi.org/10.1016/j.jbi.2013.07.012
Liu K, El-Gohary N (2017)Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports. Autom Constr 81. https://doi.org/10.1016/j.autcon.2017.02.003
Ma K, Tian M, Tan Y, Xie X, Qiu Q (2021) What is this article about? Generative summarization with the BERT model in the geosciences domain. Earth Science Informatics. 1-16
Mani I, Wilson G (2000) Robust temporal processing of news [C]. Proceedings of the 38th Annual Meeting on ACL, Hongkong, 69-76
Martins B, Manguinhas H, Borbinha J, Siabato W (2021) A geo-temporal information extraction service for processing descriptive metadata in digital libraries
Meng Y, Rumshisky A, Romanov A (2017) Temporal information extraction for question answering using syntactic dependencies in an LSTM-based architecture. arXiv preprint arXiv:1703.05851.
Moharasan G, Ho T-B(2019) Extraction of temporal information from clinical narratives. J Healthc Inform Res 3. https://doi.org/10.1007/s41666-019-00049-0
Paramita P, Minard A-LM(2014) Fbk-hlt-time: a complete italian temporal processing system for eventi-evalita 2014. In: Fourth International Workshop EVALITA 2014, pp 44–49
Peters ME, Neumann M, Iyyer M et al (2018) Deep contextualized word representations [C]. Proc of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, Stroudsburg, 2227-2237
Qiu Q, Xie Z, Wu L et al (2019)BiLSTM-CRF for geological named entity recognition from the geoscience literature[J]. Earth Sci Inf 12(4):565–579
Qiu Q, Xie Z, Wu L et al (2020) Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques[J]. Earth Sci Inf 13(4):1393–1410
Qu J, Ouyang D, Hua W, Ye, Yuxin, Li X (2018) Distant supervision for neural relation extraction integrated with word attention and property features. Neural Netw 100. https://doi.org/10.1016/j.neunet.2018.01.006
Radford A, Narasimhan K, Salimans T (2018) Improving language understanding with unsupervised learning [EB /OL]. [2019-10-30]. https://www.openai.com/blog/language-unsupervised
Rumelhart D, Hinton G, Williams R (1986) Learning representations by back-propagating errors. Nature 323:533–536
Sagcan M, Karagoz P (2015) Toponym recognition in social media for estimating the location of events. ICDM Workshops, 33–39
Santos R, Murrietaflores P, Calado P, Martins B (2017) Toponym matching through deep neural networks. Int J Geogr Inf Sci 32(3):1–25
Song G, Zhang S, Jia F, Jiang S (2019) Temporal information extraction and normalization method in Chinese Texts [J]. J Geomat Sci Technol 36(05):538–544
Strötgen J, Gertz M (2010) Heideltime: High quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp 321-324
Tourille J, Ferret O, Névéol A, Tannier X (2017) Temporal information extraction from clinical text, 739-745. https://doi.org/10.18653/v1/E17-2117
Tourille J, Ferret O, Neveol A, Tannier X (2016) Extraction de relations temporelles dans des dossiers électroniques patient, in: Actes de la Conference Traitement Automatique des Langues Naturelles (TALN 2016, article court), Paris, France
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need [C]. Advances in Neural. Information Processing Systems 30. Curran Associates, New York, pp 5998–6008
Verhagen M, Saur R, Caselli T, et al (2010)SemEval-2010 task 13: TempEval-2 [C]. Proceedings of the 5th International Workshop on Semantic Evaluation. Uppsala, Sweden, 57-62
Viani N, Kam J, Yin L, Bittar A, Dutta R, Patel R, Stewart R, Sumithra V (2020) Temporal information extraction from mental health records to identify duration of untreated psychosis. J Biomed Semantics 11. https://doi.org/10.1186/s13326-020-00220-2
Vicente-Díez MT, Martínez P (2009) Temporal semantics extraction for improving web search. DEXA Workshops, 69–73
Wang W, Kreimeyer K, Woo E, Ball R, Foster M, Pandey A, Scott J, Botsis T (2016) A new algorithmic approach for the extraction of temporal associations from clinical narratives with an application to medical product safety surveillance reports. J Biomed Inform 62. https://doi.org/10.1016/j.jbi.2016.06.006
Wang J, Hu Y, Joseph K (2020) NeuroTPR: A neuro-net toponym recognition model for extracting locations from social media messages[J]. Trans GIS 24(3):719–735
Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Netw 1(4):339–356
Wong K-F, Xia Y, Li W, Yuan C (2012) An overview of temporal information extraction. Int J Comput Process Lang 18. https://doi.org/10.1142/S0219427905001225
Wu T, Zhou Y, Huang X, Wu L (2010) Chinese time expression recognition based on automatically generated. Basic Time Unit Rules 24(04):3–10
Yao L, Zhang Y, Chen Q, Qian H, Hu Z (2017) Mining coherent topics in documents using word embeddings and large-scale text data. Eng Appl Artif Intell 64:432–439
Zhang Chunju Z, Xueying L, Ming W (2014) Temporal information analysis method in Chinese text [J]. Geogr Geo-Inf Sci 30(06):1–7
Zhou X, Li H, Lu X, Duan H (2011) Temporal expression recognition and temporal relationship extraction from chinese narrative medical records. 2011 5th International Conference on Bioinformatics and Biomedical Engineering, Wuhan, pp 1-4. https://doi.org/10.1109/icbbe.2011.5780699
Zhou P, Xu J, Qi Z, Bao H, Chen Z, Xu B (2018) Distant supervision for relation extraction with hierarchical selective attention. Neural Netw 108. https://doi.org/10.1016/j.neunet.2018.08.016
Zhou X, Tong W, Li L (2020) Deep learning spatiotemporal air pollution data in China using data fusion. Earth Sci Inform 13:859–868. https://doi.org/10.1007/s12145-020-00470-9
Acknowledgements
The study was supported by the National Natural Science Foundation of China (No. 42050101, U1711267, 41871311, 41871305), the Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (No. KLIGIP-2021A01), Major scientific and technological innovation projects in Shandong Province (2019JZZY020105), the China Postdoctoral Science Foundation (No.2021M702991), and the Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) (No. CUG2106116)).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ma, K., Tan, Y., Tian, M. et al. Extraction of temporal information from social media messages using the BERT model. Earth Sci Inform 15, 573–584 (2022). https://doi.org/10.1007/s12145-021-00756-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-021-00756-6