Abstract
This paper proposes a method to apply prior knowledge about topics of interest to Latent Dirichlet Allocation (LDA). The conventional LDA sometimes fails to detect specific topics of interest. Therefore, our approach uses word2vec to acquire linkages between words related to specific topics. The extracted linkages are used as prior knowledge about the topics in the subsequent LDA process. The extracted linkages can also be used to annotate words in a consistent manner. Such consistent annotations cannot be realized using conventional LDA, which relies on bag-of-words–based clustering. We examine our approach by applying it to travelers’ reviews, to detect topics related to Japanese shrines. The experimental results show that our approach is effective in the following three aspects: (1) The average coherence of our approach, i.e., the semantic consistencies among words, outperforms that of the conventional LDA. (2) Words in each sentence are annotated such that the annotations reflect the topic of the sentence. The conventional LDA sometimes makes confusing/mixed annotations to the words in a single sentence. Our approach, on the contrary, can make annotations that reflect the topic of the sentence in a consistent manner. (3) Our approach enables to detect very specific topics complying with users’ interests.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Budhkar, A., Rudzicz, F.: Augmenting word2vec with latent Dirichlet allocation within a clinical application. arXiv preprint arXiv:1808.03967 (2018)
He, Y.: Extracting topical phrases from clinical documents. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 2957–2963 (2016)
Li, C., et al.: LDA meets word2vec: A novel model for academic abstract clustering. In: Companion Proceedings of the The Web Conference 2018, pp. 1699–1706. International World Wide Web Conferences Steering Committee (2018)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances In Neural Information Processing Systems, pp. 3111–3119 (2013)
Moody, C.E.: Mixing Dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019 (2016)
Yao, L., et al.: Incorporating knowledge graph embeddings into topic modeling. In: Thirty-First AAAI Conference on Artificial Intelligence, pp. 3119–3126 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Uehara, H., Ito, A., Saito, Y., Yoshida, K. (2019). Prior-Knowledge-Embedded LDA with Word2vec – for Detecting Specific Topics in Documents. In: Ohara, K., Bai, Q. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2019. Lecture Notes in Computer Science(), vol 11669. Springer, Cham. https://doi.org/10.1007/978-3-030-30639-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-30639-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30638-0
Online ISBN: 978-3-030-30639-7
eBook Packages: Computer ScienceComputer Science (R0)