Abstract
Recently, topic modeling has been widely used to discover the abstract topics in the multimedia field. Most of the existing topic models are based on the assumption of three-layer hierarchical Bayesian structure, i.e. each document is modeled as a probability distribution over topics, and each topic is a probability distribution over words. However, the assumption is not optimal. Intuitively, it’s more reasonable to assume that each topic is a probability distribution over concepts, and then each concept is a probability distribution over words, i.e. adding a latent concept layer between topic layer and word layer in traditional three-layer assumption. In this paper, we verify the proposed assumption by incorporating the new assumption in two representative topic models, and obtain two novel topic models. Extensive experiments were conducted among the proposed models and corresponding baselines, and the results show that the proposed models significantly outperform the baselines in terms of case study and perplexity, which means the new assumption is more reasonable than traditional one.
Similar content being viewed by others
Notes
Source code can be found at https://github.com/anonymity01/CLDA
References
Blei D, Griffiths T, Jordan M, Tenenbaum J (2004) Hierarchical topic models and the nested chinese restaurant process. Adv Neural Inf Proces Syst 16:106
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Cao D, Ji R, Lin D, Li S (2016) Visual sentiment topic model based microblog image sentiment analysis. Multimedia Tools and Applications 75(15):8955–8968
Cao Z, Li S, Liu Y, Li W, Ji H (2015) A novel neural topic model and its supervised extension. In: AAAI, pp 2210–2216
Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Fan Y, Zhou Q, Yue W, Zhu W (2017) A dynamic causal topic model for mining activities from complex videos. Multimedia Tools and Applications:1–16. https://link.springer.com/article/10.1007/s11042-017-4760-4
Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 50–57
Hu W, Tsujii J (2016) A latent concept topic model for robust topic inference using word embeddings. In: The 54th annual meeting of the association for computational linguistics, p 380
Jayabharathy J, Kanmani S, Sivaranjani N (2014) Correlated concept based topic updation model for dynamic corpora. Int J Comput Appl 89(10):1–7
Joshi A, Bhattacharyya P, Carman M (2016) Political issue extraction model: a novel hierarchical topic model that uses tweets by political and non-political authors. In: Proceedings of NAACL-HLT, pp 82–90
Lim KW, Chen C, Buntine W (2016) Twitter-network topic model: A full bayesian treatment for social network and text modeling. arXiv:1609.06791
Magnusson M, Jonsson L, Villani M (2016) Dolda-a regularized supervised topic model for high-dimensional multi-class regression. arXiv:1602.00260
Mao XL, Ming ZY, Chua TS, Li S, Yan H, Li X (2012) SSHLDA: a semi-supervised hierarchical topic model. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, pp 800–809
Mao XL, Xiao Y, Zhou Q, Wang J, Huang H (2015) Ehllda: a supervised hierarchical topic model. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, Cham, pp 215–226
Mimno D, Li W, Mccallum A (2007) Mixtures of hierarchical topics with pachinko allocation pp 633–640
Murphy GL (2004) The big book of concepts. J Child Lang 31(1):247–253
Perotte A, Bartlett N, Elhadad N, Wood F (2011) Hierarchically supervised latent dirichlet allocation. Neural Information Processing Systems (to appear)
Petinot Y, McKeown K, Thadani K (2011) A hierarchical model of web summaries. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, vol 2. Association for Computational Linguistics, pp 670–675
Qian S, Zhang T, Xu C, Shao J (2016) Multi-modal event topic model for social event analysis. IEEE Trans Multimedia 18(2):233–246
Ramage D, Hall D, Nallapati R, Manning C (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, pp 248–256
Ramage D, Manning C, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 457–465
Rubin T, Chambers A, Smyth P, Steyvers M (2011) Statistical topic models for multi-label document classification. arXiv:1107.2462
Shin SJ, Moon IC (2017) Guided htm: Hierarchical topic model with dirichlet forest priors. IEEE Trans Knowl Data Eng 29(2):330–343
Tang YK, Mao XL, Huang H (2016) Labeled phrase latent dirichlet allocation. In: International conference on web information systems engineering. Springer International Publishing, pp 525–536
Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581
Wang Z, Wang H, Wen JR, Xiao Y (2015) An inference approach to basic level of categorization. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 653–662
Wu W, Li H, Wang H, Zhu KQ (2012) Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 481–492
Yan Y, Ricci E, Subramanian R, Liu G, Lanz O, Sebe N (2016) A multi-task learning framework for head pose estimation under target motion. IEEE Trans Pattern Anal Mach Intell 38(6):1070–1083
Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann AG, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878
Yao L, Zhang Y, Wei B, Li L, Wu F, Zhang P, Bian Y (2016) Concept over time: the combination of probabilistic topic model with wikipedia knowledge. Expert Systems With Applications 60:27–38
Yao L, Zhang Y, Wei B, Qian H, Wang Y (2015) Incorporating probabilistic knowledge into topic models. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 586–597
Zhang C, Ek C, Gratal X, Pokorny F, Kjellstrom H (2013) Supervised hierarchical dirichlet processes with variational inference. In: Proceedings of the IEEE international conference on computer vision workshops, pp 254–261
Acknowledgements
This work was supported by 863 Program (2015AA015404), China National Science Foundation (61402036, 60973083, 61273363), Beijing Technology Project (Z151100001615029), Science and Technology Planning Project of Guangdong Province (2014A010103009, 2015A020217002), Guangzhou Science and Technology Planning Project(201604020179). Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201738)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tang, YK., Mao, XL., Huang, H. et al. Conceptualization topic modeling. Multimed Tools Appl 77, 3455–3471 (2018). https://doi.org/10.1007/s11042-017-5145-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5145-4