Abstract
Dynamic text documents, including news articles, user reviews, and blogs, are now commonly encountered in many fields. Accordingly, the topics underlying text streams also change over time. To grasp the topic changes in the increasing accumulation of text documents, there is a great need to develop automatic text analysis models to find the key changes in topics. To this end, this study proposes a topic change point detection (Topic-CD) model. Different from previous studies, we define the change point of topics from the perspective of hyperparameters associated with topic-word distributions. This allows the model to detect change points underlying the whole topic set. Under this definition, the topic modeling and change point detection are combined in a unified framework and then performed simultaneously using a Markov chain Monte Carlo algorithm. In addition, the Topic-CD model is free from setting the number of change points in advance, which makes it more convenient for practical use. We investigate the performance of the Topic-CD model numerically using synthetic data and three real datasets. The results show that the Topic-CD model can well identify the change points in topics when compared with several state-of-the-art methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahmed A, Xing EP (2008) Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In: Proceedings of the SIAM international conference on data mining. pp 219–230
Ahmed A, Xing EP (2010) Timeline: a dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Proceedings of the twenty-sixth conference on uncertainty in artificial intelligence. pp 20–29
AlSumait L, Barbará D, Domeniconi C (2008) On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE international conference on data mining. pp 3–12
Bai J (1997) Estimation of a change point in multiple regression models. Rev Econ Statist 79(4):551–563
Beeferman D, Berger A, Lafferty J (1999) Statistical models for text segmentation. Mach Learn 34(1–3):177–210
Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the twenty-third international conference on machine learning. pp 113–120
Blei D, Mcauliffe JD (2008) Supervised topic models. Adv Neural Inf Process Syst 3:327–332
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Bruggermann D, Hermey Y, Orth C, Schneider D, Selzer S, Spanakis G (2016) Storyline detection and tracking using dynamic latent Dirichlet allocation. In: Proceedings of the 2nd workshop on computing news storylines (CNS 2016). pp 9–19
Chae J, Thom D, Bosch H, Yun J, Maciejewski R, Ebert DS, Ertl T (2012) Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition. In: IEEE conference on visual analytics science and technology. pp 143–152
Chib S (1998) Estimation and comparison of multiple change-point models. J Econom 86(2):221–241
Dieng AB, Ruiz F, Blei DM (2019) The dynamic embedded topic model. arXiv:1907.05545
Dubey A, Hefny A, Williamson S, Xing EP (2013) A nonparametric mixture model for topic modeling over time. In: Proceedings of the SIAM international conference on data mining. pp 530–538
Greene D, Cross JP (2016) Exploring the political agenda of the European parliament using a dynamic topic modeling approach. Polit Anal 25(1):77–94
Guo X, Xiang Y, Chen Q, Huang Z, Hao Y (2013) LDA-based online topic detection using tensor factorization. J Inf Sci 39(4):459–469
Hasan M, Orgun MA, Schwitter R (2017) A survey on real-time event detection from the Twitter data stream. J Inf Sci 44(4):443–463
He J, Chen X, Du M, Jiang H (2015) Topic evolution analysis based on improved online LDA model. J Cent South Univ (Sci Technol) 46(2):547–553
Hoffman MD, Blei DM, Bach FR (2010) Online learning for latent Dirichlet allocation. In: International conference on neural information processing systems. pp 1–9
Holz F, Teresniak S (2010) Towards automatic detection and tracking of topic change. In: The 11th international conference on computational linguistics and intelligent text processing. pp 327–339
Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96(453):161–173
Kawamae N (2011) Trend analysis model: trend consists of temporal words, topics, and timestamps. In: Proceedings of the fourth ACM international conference on web search and data mining. pp 317–326
Ko SIM, Chong TTL, Ghosh P (2015) Dirichlet process hidden Markov multiple change-point model. Bayesian Anal 10(2):275–296
Lan D, Buntine W, Johnson M (2013) Topic segmentation with a structured topic model. In: Proceedings of annual conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT). pp 190–200
Lau J, Collier N, Baldwin T (2012) On-line trend analysis with topic models: Twitter trends detection topic model online. In: Proceedings of 24th international conference on computational linguistics. pp 1519–1534
Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on information and knowledge management. pp 375–384
Mohamad S, Bouchachia A (2019) Online Gaussian LDA for unsupervised pattern mining from utility usage data. arXiv:1910.11599
Nallapati RM, Ditmore S, Lafferty JD, Ung K (2007) Multiscale topic tomography. In: International conference on knowledge discovery and data mining. pp 520–529
Pevzner L, Hearst M (2002) A critique and improvement of an evaluation metric for text segmentation. Comput Linguist 28:1–19
Pozdnoukhov A, Kaiser C (2011) Space-time dynamics of topics in streaming text. In: ACM Sigspatial international workshop on location-based social networks. pp 1–8
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing. pp 248–256
Sasaki K, Yoshikawa T, Furuhashi T (2014) Online topic model for Twitter considering dynamics of user interests and topic trends. In: Proceedings of the conference on empirical methods in natural language processing. pp 1977–1985
Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476):1566–1581
Truong C, Oudre L, Vayatis N (2020) Selective review of offline change point detection methods. Signal Process 167:107299
Vavliakis KN, Tzima FA, Mitkas PA (2012) Event detection via LDA for the mediaeval 2012 SED task. In: Proceedings of the MediaEval 2012 workshop
Wang Y, Goutte C (2018) Real-time change point detection using on-line topic models. In: Proceedings of the 27th international conference on computational linguistics. pp 2505–2515
Wang X, Mccallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. pp 424–433
Wang C, Blei D, Heckerman D (2015) Continuous time dynamic topic models. arXiv:1206.3298
Wu Q, Zhang C, Hong Q, Chen L (2014) Topic evolution based on LDA and HMM and its application in stem cell research. J Inf Sci 40(5):611–620
Zhang Y, Chen H, Lu J, Zhang G (2017) Detecting and predicting the topic change of knowledge-based systems: a topic-based bibliometric analysis from 1991 to 2016. Knowl Based Syst 133:255–268
Zhong N, Schweidel DA (2020) Capturing changes in social media content: a multiple latent changepoint topic model. Mark Sci 39(4):827–846
Zhou X, Chen L (2014) Event detection over Twitter social media streams. VLDB J 23(3):381–400
Zhou H, Yu H, Hu R (2017) Topic evolution based on the probabilistic topic model: a review. Front Comput Sci 11(5):786–802
Acknowledgements
We thank Yichao Feng (now working in Jingdong company) and Yandi Zhu (now studying in Peking University) for their supports in data exploration. This work is also supported by National Natural Science Foundation of China (Nos. 72171229, 72001205, 11971504), fund for building world-class universities (disciplines) of Renmin University of China, Foundation from Ministry of Education of China (20JZD023), Ministry of Education Focus on Humanities and Social Science Research Base (Major Research Plan 17JJD910001).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Responsible editor: Johannes Fürnkranz.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, X., Guo, Y., Chen, J. et al. Topic change point detection using a mixed Bayesian model. Data Min Knowl Disc 36, 146–173 (2022). https://doi.org/10.1007/s10618-021-00804-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-021-00804-1