Conceptualization topic modeling

Yi-Kun Tang^1,2,
Xian-Ling Mao¹,
Heyan Huang¹,
Xuewen Shi¹ &
…
Guihua Wen³

858 Accesses
Explore all metrics

Abstract

Recently, topic modeling has been widely used to discover the abstract topics in the multimedia field. Most of the existing topic models are based on the assumption of three-layer hierarchical Bayesian structure, i.e. each document is modeled as a probability distribution over topics, and each topic is a probability distribution over words. However, the assumption is not optimal. Intuitively, it’s more reasonable to assume that each topic is a probability distribution over concepts, and then each concept is a probability distribution over words, i.e. adding a latent concept layer between topic layer and word layer in traditional three-layer assumption. In this paper, we verify the proposed assumption by incorporating the new assumption in two representative topic models, and obtain two novel topic models. Extensive experiments were conducted among the proposed models and corresponding baselines, and the results show that the proposed models significantly outperform the baselines in terms of case study and perplexity, which means the new assumption is more reasonable than traditional one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

An Attention Hierarchical Topic Modeling

Article 01 October 2021

A novel topic model for documents by incorporating semantic relations between words

Article 23 December 2019

Topic Representation using Semantic-Based Patterns

Notes

Source code can be found at https://github.com/anonymity01/CLDA

References

Blei D, Griffiths T, Jordan M, Tenenbaum J (2004) Hierarchical topic models and the nested chinese restaurant process. Adv Neural Inf Proces Syst 16:106
Google Scholar
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Cao D, Ji R, Lin D, Li S (2016) Visual sentiment topic model based microblog image sentiment analysis. Multimedia Tools and Applications 75(15):8955–8968
Article Google Scholar
Cao Z, Li S, Liu Y, Li W, Ji H (2015) A novel neural topic model and its supervised extension. In: AAAI, pp 2210–2216
Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Article Google Scholar
Fan Y, Zhou Q, Yue W, Zhu W (2017) A dynamic causal topic model for mining activities from complex videos. Multimedia Tools and Applications:1–16. https://link.springer.com/article/10.1007/s11042-017-4760-4
Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 50–57
Hu W, Tsujii J (2016) A latent concept topic model for robust topic inference using word embeddings. In: The 54th annual meeting of the association for computational linguistics, p 380
Jayabharathy J, Kanmani S, Sivaranjani N (2014) Correlated concept based topic updation model for dynamic corpora. Int J Comput Appl 89(10):1–7
Google Scholar
Joshi A, Bhattacharyya P, Carman M (2016) Political issue extraction model: a novel hierarchical topic model that uses tweets by political and non-political authors. In: Proceedings of NAACL-HLT, pp 82–90
Lim KW, Chen C, Buntine W (2016) Twitter-network topic model: A full bayesian treatment for social network and text modeling. arXiv:1609.06791
Magnusson M, Jonsson L, Villani M (2016) Dolda-a regularized supervised topic model for high-dimensional multi-class regression. arXiv:1602.00260
Mao XL, Ming ZY, Chua TS, Li S, Yan H, Li X (2012) SSHLDA: a semi-supervised hierarchical topic model. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, pp 800–809
Mao XL, Xiao Y, Zhou Q, Wang J, Huang H (2015) Ehllda: a supervised hierarchical topic model. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, Cham, pp 215–226
Google Scholar
Mimno D, Li W, Mccallum A (2007) Mixtures of hierarchical topics with pachinko allocation pp 633–640
Murphy GL (2004) The big book of concepts. J Child Lang 31(1):247–253
Article Google Scholar
Perotte A, Bartlett N, Elhadad N, Wood F (2011) Hierarchically supervised latent dirichlet allocation. Neural Information Processing Systems (to appear)
Petinot Y, McKeown K, Thadani K (2011) A hierarchical model of web summaries. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, vol 2. Association for Computational Linguistics, pp 670–675
Qian S, Zhang T, Xu C, Shao J (2016) Multi-modal event topic model for social event analysis. IEEE Trans Multimedia 18(2):233–246
Article Google Scholar
Ramage D, Hall D, Nallapati R, Manning C (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, pp 248–256
Ramage D, Manning C, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 457–465
Rubin T, Chambers A, Smyth P, Steyvers M (2011) Statistical topic models for multi-label document classification. arXiv:1107.2462
Shin SJ, Moon IC (2017) Guided htm: Hierarchical topic model with dirichlet forest priors. IEEE Trans Knowl Data Eng 29(2):330–343
Article Google Scholar
Tang YK, Mao XL, Huang H (2016) Labeled phrase latent dirichlet allocation. In: International conference on web information systems engineering. Springer International Publishing, pp 525–536
Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581
Article MathSciNet Google Scholar
Wang Z, Wang H, Wen JR, Xiao Y (2015) An inference approach to basic level of categorization. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 653–662
Wu W, Li H, Wang H, Zhu KQ (2012) Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 481–492
Yan Y, Ricci E, Subramanian R, Liu G, Lanz O, Sebe N (2016) A multi-task learning framework for head pose estimation under target motion. IEEE Trans Pattern Anal Mach Intell 38(6):1070–1083
Article Google Scholar
Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann AG, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878
Article MathSciNet Google Scholar
Yao L, Zhang Y, Wei B, Li L, Wu F, Zhang P, Bian Y (2016) Concept over time: the combination of probabilistic topic model with wikipedia knowledge. Expert Systems With Applications 60:27–38
Article Google Scholar
Yao L, Zhang Y, Wei B, Qian H, Wang Y (2015) Incorporating probabilistic knowledge into topic models. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 586–597
Chapter Google Scholar
Zhang C, Ek C, Gratal X, Pokorny F, Kjellstrom H (2013) Supervised hierarchical dirichlet processes with variational inference. In: Proceedings of the IEEE international conference on computer vision workshops, pp 254–261

Download references

Acknowledgements

This work was supported by 863 Program (2015AA015404), China National Science Foundation (61402036, 60973083, 61273363), Beijing Technology Project (Z151100001615029), Science and Technology Planning Project of Guangdong Province (2014A010103009, 2015A020217002), Guangzhou Science and Technology Planning Project(201604020179). Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201738)

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
Yi-Kun Tang, Xian-Ling Mao, Heyan Huang & Xuewen Shi
Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, 350121, China
Yi-Kun Tang
Department of Computer Science and Technology, South China University of Technology, Guangzhou Shi, 510630, Guangdong Sheng, China
Guihua Wen

Authors

Yi-Kun Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xian-Ling Mao
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xuewen Shi
View author publications
You can also search for this author in PubMed Google Scholar
Guihua Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xian-Ling Mao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, YK., Mao, XL., Huang, H. et al. Conceptualization topic modeling. Multimed Tools Appl 77, 3455–3471 (2018). https://doi.org/10.1007/s11042-017-5145-4

Download citation

Received: 03 August 2017
Revised: 20 August 2017
Accepted: 22 August 2017
Published: 07 September 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11042-017-5145-4

Abstract

Access this article

Subscribe and save

Buy Now