research-article

Building for tomorrow: : Assessing the temporal persistence of text classifiers

Authors:

Rabab Alkhalifa,

Elena Kochkina,

Arkaitz ZubiagaAuthors Info & Claims

Volume 60, Issue 2

https://doi.org/10.1016/j.ipm.2022.103200

Published: 01 March 2023 Publication History

Abstract

Performance of text classification models tends to drop over time due to changes in data, which limits the lifetime of a pretrained model. Therefore an ability to predict a model’s ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we provide a thorough discussion into the problem, establish an evaluation setup for the task. We look at this problem from a practical perspective by assessing the ability of a wide range of language models and classification algorithms to persist over time, as well as how dataset characteristics can help predict the temporal stability of different models. We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years, and involving diverse tasks and types of data. By splitting the longitudinal datasets into years, we perform a comprehensive set of experiments by training and testing across data that are different numbers of years apart from each other, both in the past and in the future. This enables a gradual investigation into the impact of the temporal gap between training and test sets on the classification performance, as well as measuring the extent of the persistence over time. Through experimenting with a range of language models and algorithms, we observe a consistent trend of performance drop over time, which however differs significantly across datasets; indeed, datasets whose domain is more closed and language is more stable, such as with book reviews, exhibit a less pronounced performance drop than open-domain social media datasets where language varies significantly more. We find that one can estimate how a model will retain its performance over time based on (i) how well the model performs over a restricted time period and its extrapolation to a longer time period, and (ii) the linguistic characteristics of the dataset, such as the familiarity score between subsets from different years. Findings from these experiments have important implications for the design of text classification models with the aim of preserving performance over time.

Highlights

•

We shed light into the temporal persistence of existing language models.

•

We analyse when and why model performance drops over time, which informs when a model needs adapting.

•

We investigate the impact of classification model choice in cross-temporal performance.

•

We analyse the impact of the dataset properties on performance drop over time.

•

We assess the potential and limitations of contextual language models to improve temporal persistence.

References

[1]

Alkhalifa, R., Kochkina, E., & Zubiaga, A. (2021). Opinions are Made to be Changed: Temporally Adaptive Stance Classification. In Proceedings of the ACM hypertext workshop on open challenges in online social networks.

Abstract

Highlights

References

Cited By

Index Terms

Recommendations

LongEval: Longitudinal Evaluation of Model Performance at CLEF 2024

LongEval: Longitudinal Evaluation of Model Performance at CLEF 2023

Overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations