Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts

Felix Drinkall, Stefan Zohren, Janet Pierrehumbert

Abstract

We present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states’ COVID-19 subreddits. We benchmark these clustered embedding features against features extracted from other high-quality datasets. In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals, a significant result for infectious disease modelling in areas where epidemiological data is unreliable. Subsequently, in a time-series forecasting task, we fully utilise the predictive power of the caseload and compare the relative strengths of using different supplementary datasets as covariate feature sets in a transformer-based time-series model.

Anthology ID:: 2022.naacl-main.105
Volume:: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1471–1484
Language:
URL:: https://aclanthology.org/2022.naacl-main.105
DOI:: 10.18653/v1/2022.naacl-main.105
Bibkey:
Cite (ACL):: Felix Drinkall, Stefan Zohren, and Janet Pierrehumbert. 2022. Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1471–1484, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts (Drinkall et al., NAACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.naacl-main.105.pdf
Video:: https://aclanthology.org/2022.naacl-main.105.mp4

PDF Cite Search Video Fix metadata