[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Topic change point detection using a mixed Bayesian model

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Dynamic text documents, including news articles, user reviews, and blogs, are now commonly encountered in many fields. Accordingly, the topics underlying text streams also change over time. To grasp the topic changes in the increasing accumulation of text documents, there is a great need to develop automatic text analysis models to find the key changes in topics. To this end, this study proposes a topic change point detection (Topic-CD) model. Different from previous studies, we define the change point of topics from the perspective of hyperparameters associated with topic-word distributions. This allows the model to detect change points underlying the whole topic set. Under this definition, the topic modeling and change point detection are combined in a unified framework and then performed simultaneously using a Markov chain Monte Carlo algorithm. In addition, the Topic-CD model is free from setting the number of change points in advance, which makes it more convenient for practical use. We investigate the performance of the Topic-CD model numerically using synthetic data and three real datasets. The results show that the Topic-CD model can well identify the change points in topics when compared with several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Ahmed A, Xing EP (2008) Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In: Proceedings of the SIAM international conference on data mining. pp 219–230

  • Ahmed A, Xing EP (2010) Timeline: a dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Proceedings of the twenty-sixth conference on uncertainty in artificial intelligence. pp 20–29

  • AlSumait L, Barbará D, Domeniconi C (2008) On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE international conference on data mining. pp 3–12

  • Bai J (1997) Estimation of a change point in multiple regression models. Rev Econ Statist 79(4):551–563

    Article  Google Scholar 

  • Beeferman D, Berger A, Lafferty J (1999) Statistical models for text segmentation. Mach Learn 34(1–3):177–210

    Article  Google Scholar 

  • Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  Google Scholar 

  • Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the twenty-third international conference on machine learning. pp 113–120

  • Blei D, Mcauliffe JD (2008) Supervised topic models. Adv Neural Inf Process Syst 3:327–332

    Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Bruggermann D, Hermey Y, Orth C, Schneider D, Selzer S, Spanakis G (2016) Storyline detection and tracking using dynamic latent Dirichlet allocation. In: Proceedings of the 2nd workshop on computing news storylines (CNS 2016). pp 9–19

  • Chae J, Thom D, Bosch H, Yun J, Maciejewski R, Ebert DS, Ertl T (2012) Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition. In: IEEE conference on visual analytics science and technology. pp 143–152

  • Chib S (1998) Estimation and comparison of multiple change-point models. J Econom 86(2):221–241

    Article  MathSciNet  Google Scholar 

  • Dieng AB, Ruiz F, Blei DM (2019) The dynamic embedded topic model. arXiv:1907.05545

  • Dubey A, Hefny A, Williamson S, Xing EP (2013) A nonparametric mixture model for topic modeling over time. In: Proceedings of the SIAM international conference on data mining. pp 530–538

  • Greene D, Cross JP (2016) Exploring the political agenda of the European parliament using a dynamic topic modeling approach. Polit Anal 25(1):77–94

    Article  Google Scholar 

  • Guo X, Xiang Y, Chen Q, Huang Z, Hao Y (2013) LDA-based online topic detection using tensor factorization. J Inf Sci 39(4):459–469

    Article  Google Scholar 

  • Hasan M, Orgun MA, Schwitter R (2017) A survey on real-time event detection from the Twitter data stream. J Inf Sci 44(4):443–463

    Article  Google Scholar 

  • He J, Chen X, Du M, Jiang H (2015) Topic evolution analysis based on improved online LDA model. J Cent South Univ (Sci Technol) 46(2):547–553

    Google Scholar 

  • Hoffman MD, Blei DM, Bach FR (2010) Online learning for latent Dirichlet allocation. In: International conference on neural information processing systems. pp 1–9

  • Holz F, Teresniak S (2010) Towards automatic detection and tracking of topic change. In: The 11th international conference on computational linguistics and intelligent text processing. pp 327–339

  • Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96(453):161–173

    Article  MathSciNet  Google Scholar 

  • Kawamae N (2011) Trend analysis model: trend consists of temporal words, topics, and timestamps. In: Proceedings of the fourth ACM international conference on web search and data mining. pp 317–326

  • Ko SIM, Chong TTL, Ghosh P (2015) Dirichlet process hidden Markov multiple change-point model. Bayesian Anal 10(2):275–296

    Article  MathSciNet  Google Scholar 

  • Lan D, Buntine W, Johnson M (2013) Topic segmentation with a structured topic model. In: Proceedings of annual conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT). pp 190–200

  • Lau J, Collier N, Baldwin T (2012) On-line trend analysis with topic models: Twitter trends detection topic model online. In: Proceedings of 24th international conference on computational linguistics. pp 1519–1534

  • Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on information and knowledge management. pp 375–384

  • Mohamad S, Bouchachia A (2019) Online Gaussian LDA for unsupervised pattern mining from utility usage data. arXiv:1910.11599

  • Nallapati RM, Ditmore S, Lafferty JD, Ung K (2007) Multiscale topic tomography. In: International conference on knowledge discovery and data mining. pp 520–529

  • Pevzner L, Hearst M (2002) A critique and improvement of an evaluation metric for text segmentation. Comput Linguist 28:1–19

    Article  Google Scholar 

  • Pozdnoukhov A, Kaiser C (2011) Space-time dynamics of topics in streaming text. In: ACM Sigspatial international workshop on location-based social networks. pp 1–8

  • Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing. pp 248–256

  • Sasaki K, Yoshikawa T, Furuhashi T (2014) Online topic model for Twitter considering dynamics of user interests and topic trends. In: Proceedings of the conference on empirical methods in natural language processing. pp 1977–1985

  • Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476):1566–1581

    Article  MathSciNet  Google Scholar 

  • Truong C, Oudre L, Vayatis N (2020) Selective review of offline change point detection methods. Signal Process 167:107299

    Article  Google Scholar 

  • Vavliakis KN, Tzima FA, Mitkas PA (2012) Event detection via LDA for the mediaeval 2012 SED task. In: Proceedings of the MediaEval 2012 workshop

  • Wang Y, Goutte C (2018) Real-time change point detection using on-line topic models. In: Proceedings of the 27th international conference on computational linguistics. pp 2505–2515

  • Wang X, Mccallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. pp 424–433

  • Wang C, Blei D, Heckerman D (2015) Continuous time dynamic topic models. arXiv:1206.3298

  • Wu Q, Zhang C, Hong Q, Chen L (2014) Topic evolution based on LDA and HMM and its application in stem cell research. J Inf Sci 40(5):611–620

    Article  Google Scholar 

  • Zhang Y, Chen H, Lu J, Zhang G (2017) Detecting and predicting the topic change of knowledge-based systems: a topic-based bibliometric analysis from 1991 to 2016. Knowl Based Syst 133:255–268

    Article  Google Scholar 

  • Zhong N, Schweidel DA (2020) Capturing changes in social media content: a multiple latent changepoint topic model. Mark Sci 39(4):827–846

    Article  Google Scholar 

  • Zhou X, Chen L (2014) Event detection over Twitter social media streams. VLDB J 23(3):381–400

    Article  MathSciNet  Google Scholar 

  • Zhou H, Yu H, Hu R (2017) Topic evolution based on the probabilistic topic model: a review. Front Comput Sci 11(5):786–802

    Article  Google Scholar 

Download references

Acknowledgements

We thank Yichao Feng (now working in Jingdong company) and Yandi Zhu (now studying in Peking University) for their supports in data exploration. This work is also supported by National Natural Science Foundation of China (Nos. 72171229, 72001205, 11971504), fund for building world-class universities (disciplines) of Renmin University of China, Foundation from Ministry of Education of China (20JZD023), Ministry of Education Focus on Humanities and Social Science Research Base (Major Research Plan 17JJD910001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feifei Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Responsible editor: Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, X., Guo, Y., Chen, J. et al. Topic change point detection using a mixed Bayesian model. Data Min Knowl Disc 36, 146–173 (2022). https://doi.org/10.1007/s10618-021-00804-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-021-00804-1

Keywords

Navigation