[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Building for tomorrow: : Assessing the temporal persistence of text classifiers

Published: 01 March 2023 Publication History

Abstract

Performance of text classification models tends to drop over time due to changes in data, which limits the lifetime of a pretrained model. Therefore an ability to predict a model’s ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we provide a thorough discussion into the problem, establish an evaluation setup for the task. We look at this problem from a practical perspective by assessing the ability of a wide range of language models and classification algorithms to persist over time, as well as how dataset characteristics can help predict the temporal stability of different models. We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years, and involving diverse tasks and types of data. By splitting the longitudinal datasets into years, we perform a comprehensive set of experiments by training and testing across data that are different numbers of years apart from each other, both in the past and in the future. This enables a gradual investigation into the impact of the temporal gap between training and test sets on the classification performance, as well as measuring the extent of the persistence over time. Through experimenting with a range of language models and algorithms, we observe a consistent trend of performance drop over time, which however differs significantly across datasets; indeed, datasets whose domain is more closed and language is more stable, such as with book reviews, exhibit a less pronounced performance drop than open-domain social media datasets where language varies significantly more. We find that one can estimate how a model will retain its performance over time based on (i) how well the model performs over a restricted time period and its extrapolation to a longer time period, and (ii) the linguistic characteristics of the dataset, such as the familiarity score between subsets from different years. Findings from these experiments have important implications for the design of text classification models with the aim of preserving performance over time.

Highlights

We shed light into the temporal persistence of existing language models.
We analyse when and why model performance drops over time, which informs when a model needs adapting.
We investigate the impact of classification model choice in cross-temporal performance.
We analyse the impact of the dataset properties on performance drop over time.
We assess the potential and limitations of contextual language models to improve temporal persistence.

References

[1]
Alkhalifa, R., Kochkina, E., & Zubiaga, A. (2021). Opinions are Made to be Changed: Temporally Adaptive Stance Classification. In Proceedings of the ACM hypertext workshop on open challenges in online social networks.
[2]
Alkhalifa R., Zubiaga A., Capturing stance dynamics in social media: open challenges and research directions, International Journal of Digital Humanities (2022) 1–21.
[3]
Allein L., Augenstein I., Moens M.-F., Time-aware evidence ranking for fact-checking, Journal of Web Semantics 71 (2021).
[4]
Augenstein, I., Rocktäschel, T., Vlachos, A., & Bontcheva, K. (2016). Stance Detection with Bidirectional Conditional Encoding. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 876–885).
[5]
Biber D., Stance in spoken and written university registers, Journal of English for Academic Purposes 5 (2) (2006) 97–116.
[6]
Bojanowski P., Grave E., Joulin A., Mikolov T., Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics 5 (2017) 135–146.
[7]
Bonilla T., Mo C.H., The evolution of human trafficking messaging in the United States and its effect on public opinion, Journal of Public Policy 39 (2) (2019) 201–234.
[8]
Bruin J., Newtest: command to compute new test @ONLINE, 2011, URL https://stats.oarc.ucla.edu/stata/ado/analysis/.
[9]
Cheng L.-C., Chen K., Lee M.-C., Li K.-M., User-defined SWOT analysis–a change mining perspective on user-generated content, Information Processing & Management 58 (5) (2021).
[10]
Cho K., van Merrienboer B., Gülçehre Ç., Bougares F., Schwenk H., Bengio Y., Learning phrase representations using RNN encoder-decoder for statistical machine translation, 2014, CoRR abs/1406.1078.
[11]
Devlin J., Chang M.-W., Lee K., Toutanova K., Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv preprint arXiv:1810.04805.
[12]
Elman J.L., Finding structure in time, Cognitive Science 14 (2) (1990) 179–211.
[13]
Flach P., Machine learning: the art and science of algorithms that make sense of data, Cambridge University Press, 2012.
[14]
Florio K., Basile V., Polignano M., Basile P., Patti V., Time of your hate: The challenge of time in hate speech detection on social media, Applied Sciences 10 (12) (2020) 4180.
[15]
Ha Q.-T., Pham T.-N., Nguyen V.-Q., Nguyen T.-C., Vuong T.-H., Tran M.-T., et al., A new lifelong topic modeling method and its application to vietnamese text multi-label classification, in: Asian conference on intelligent information and database systems, Springer, 2018, pp. 200–210.
[16]
Hamilton W.L., Leskovec J., Jurafsky D., Diachronic word embeddings reveal statistical laws of semantic change, 2016, arXiv preprint arXiv:1605.09096.
[17]
He, Y., Li, J., Song, Y., He, M., & Peng, H. (2018). Time-evolving Text Classification With Deep Neural Networks. In Proceedings of IJCAI, the international joint conference on artificial intelligence (pp. 2241–2247).
[18]
Hochreiter S., Schmidhuber J., Long short-term memory, Neural Computation 9 (8) (1997) 1735–1780.
[19]
Honnibal M., Montani I., spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing, 2018, To appear.
[20]
Hu M., Liu B., Mining and summarizing customer reviews, in: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Association for Computing Machinery, New York, NY, USA, 2004, pp. 168–177.
[21]
Joachims T., Text categorization with support vector machines: Learning with many relevant features, in: European conference on machine learning, Springer, 1998, pp. 137–142.
[22]
Kim Y., Convolutional neural networks for sentence classification, 2014, arXiv preprint arXiv:1408.5882.
[23]
Kutuzov A., Øvrelid L., Szymanski T., Velldal E., Diachronic word embeddings and semantic shifts: a survey, 2018, pp. 1384–1397.
[24]
Lazaridou A., Kuncoro A., Gribovskaya E., Agrawal D., Liska A., Terzi T., et al., Mind the gap: Assessing temporal generalization in neural language models, Advances in Neural Information Processing Systems 34 (2021) 29348–29363.
[25]
Li, J., Xu, Y., & Shi, H. (2019). Bidirectional LSTM with Hierarchical Attention for Text Classification. In 2019 IEEE 4th advanced information technology, electronic and automation control conference, Vol. 1 (pp. 456–459).
[26]
Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer: analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on world wide web (pp. 342–351).
[27]
Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., et al., Roberta: A robustly optimized bert pretraining approach, 2019, arXiv preprint arXiv:1907.11692.
[28]
Lukes, J., & Søgaard, A. (2018). Sentiment analysis under temporal shift. In Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 65–71).
[29]
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2018). Advances in Pre-Training Distributed Word Representations. In Proceedings of the international conference on language resources and evaluation.
[30]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
[31]
Murayama, T., Wakamiya, S., & Aramaki, E. (2021). Mitigation of Diachronic Bias in Fake News Detection Dataset. In Proceedings of the seventh workshop on noisy user-generated text (pp. 182–188).
[32]
Nguyen T.-C., Pham T.-N., Nguyen M.-C., Nguyen T.-T., Ha Q.-T., A lifelong sentiment classification framework based on a close domain lifelong topic modeling method, in: Nguyen N.T., Jearanaitanakij K., Selamat A., Trawiński B., Chittayasothorn S. (Eds.), Intelligent information and database systems, Springer International Publishing, Cham, 2020, pp. 575–585.
[33]
Ni J., Li J., McAuley J., Justifying recommendations using distantly-labeled reviews and fine-grained aspects, in: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, Association for Computational Linguistics, Hong Kong, China, 2019, pp. 188–197. URL https://aclanthology.org/D19-1018.
[34]
Nishida, K., Hoshide, T., & Fujimura, K. (2012). Improving tweet stream classification by detecting changes in word probability. In Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (pp. 971–980).
[35]
Pennington J., Socher R., Manning C.D., Glove: Global vectors for word representation, in: Empirical methods in natural language processing, 2014, pp. 1532–1543. URL http://www.aclweb.org/anthology/D14-1162.
[36]
Preoţiuc-Pietro, D., & Cohn, T. (2013). A temporal model of text periodicities using Gaussian Processes. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 977–988).
[37]
Pustokhina I.V., Pustokhin D.A., Aswathy R., Jayasankar T., Jeyalakshmi C., Díaz V.G., et al., Dynamic customer churn prediction strategy for business intelligence using text analytics with evolutionary optimization algorithms, Information Processing & Management 58 (6) (2021).
[38]
Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I., et al., Language models are unsupervised multitask learners, OpenAI Blog 1 (8) (2019) 9.
[39]
Rocha, L., Mourão, F., Pereira, A., Gonçalves, M. A., & Meira, W. (2008). Exploiting temporal contexts in text classification. In International conference on information and knowledge management, proceedings (pp. 243–252).
[40]
Röttger, P., & Pierrehumbert, J. (2021). Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media. In Findings of the association for computational linguistics: EMNLP 2021 (pp. 2400–2412).
[41]
Salton G.D., Kelleher J.D., Persistence pays off: Paying attention to what the LSTM gating mechanism persists, 2018, CoRR abs/1810.04437.
[42]
Sebastiani F., Machine learning in automated text categorization, ACM Computing Surveys 34 (1) (2002) 1–47.
[43]
Shibata Y., Kida T., Fukamachi S., Takeda M., Shinohara A., Shinohara T., et al., Byte pair encoding: A text compression scheme that accelerates pattern matching, Citeseer, 1999.
[44]
Shoemark, P., Liza, F. F., Nguyen, D., Hale, S., & McGillivray, B. (2019). Room to Glo: A systematic comparison of semantic change detection approaches with word embeddings. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (pp. 66–76).
[45]
Smith K., The evolution of vocabulary, Journal of Theoretical Biology 228 (1) (2004) 127–142.
[46]
Treviso M., Ji T., Lee J.-U., van Aken B., Cao Q., Ciosici M.R., et al., Efficient methods for natural language processing: A survey, 2022, arXiv preprint arXiv:2209.00099.
[47]
Tsakalidis, A., Bazzi, M., Cucuringu, M., Basile, P., & McGillivray, B. (2019). Mining the UK Web Archive for Semantic Change Detection. In Proceedings of the international conference on recent advances in natural language processing (pp. 1212–1221).
[48]
Wu Y., Schuster M., Chen Z., Le Q.V., Norouzi M., Macherey W., et al., Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016, arXiv preprint arXiv:1609.08144.
[49]
Xu F., Pan Z., Xia R., E-commerce product review sentiment classification based on a naïve Bayes continuous learning framework, Information Processing & Management 57 (5) (2020).
[50]
Yin W., Alkhalifa R., Zubiaga A., The emojification of sentiment on social media: Collection and analysis of a longitudinal Twitter sentiment dataset, 2021, arXiv preprint arXiv:2108.13898.

Cited By

View all
  • (2024)Hate Speech Detection and Reclaimed Language: Mitigating False Positives and Compounded DiscriminationProceedings of the 16th ACM Web Science Conference10.1145/3614419.3644025(241-249)Online publication date: 21-May-2024
  • (2024)LongEval: Longitudinal Evaluation of Model Performance at CLEF 2024Advances in Information Retrieval10.1007/978-3-031-56072-9_8(60-66)Online publication date: 24-Mar-2024
  • (2023)Quantifying the Transience of Social Web DatasetsProceedings of the International Conference on Advances in Social Networks Analysis and Mining10.1145/3625007.3627596(286-293)Online publication date: 6-Nov-2023

Index Terms

  1. Building for tomorrow: Assessing the temporal persistence of text classifiers
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Information Processing and Management: an International Journal
          Information Processing and Management: an International Journal  Volume 60, Issue 2
          Mar 2023
          1443 pages

          Publisher

          Pergamon Press, Inc.

          United States

          Publication History

          Published: 01 March 2023

          Author Tags

          1. Text classification
          2. Temporal embedding
          3. Temporal generalisability
          4. Temporal persistence
          5. Deep learning
          6. Pretrained language models

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 25 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Hate Speech Detection and Reclaimed Language: Mitigating False Positives and Compounded DiscriminationProceedings of the 16th ACM Web Science Conference10.1145/3614419.3644025(241-249)Online publication date: 21-May-2024
          • (2024)LongEval: Longitudinal Evaluation of Model Performance at CLEF 2024Advances in Information Retrieval10.1007/978-3-031-56072-9_8(60-66)Online publication date: 24-Mar-2024
          • (2023)Quantifying the Transience of Social Web DatasetsProceedings of the International Conference on Advances in Social Networks Analysis and Mining10.1145/3625007.3627596(286-293)Online publication date: 6-Nov-2023

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media