research-article

Improving topic disentanglement via contrastive learning

Authors:

Xixi Zhou,

Jiajun Bu,

Sheng Zhou,

Zhi Yu,

Ji Zhao,

Xifeng YanAuthors Info & Claims

Volume 60, Issue 2

https://doi.org/10.1016/j.ipm.2022.103164

Published: 01 March 2023 Publication History

Abstract

With the emergence and development of deep generative models, such as the variational auto-encoders (VAEs), the research on topic modeling successfully extends to a new area: neural topic modeling, which aims to learn disentangled topics to understand the data better. However, the original VAE framework had been shown to be limited in disentanglement performance, bringing their inherent defects to a neural topic model (NTM). In this paper, we put forward that the optimization objectives of contrastive learning are consistent with two important goals (alignment and uniformity) of well-disentangled topic learning. Also, the optimization objectives of contrastive learning are consistent with two key evaluation measures for topic models, topic coherence and topic diversity. So, we come to the important conclusion that alignment and uniformity of disentangled topic learning can be quantified with topic coherence and topic diversity. Accordingly, we are inspired to propose the Contrastive Disentangled Neural Topic Model (CNTM). By representing both words and topics as low-dimensional vectors in the same embedding space, we apply contrastive learning to neural topic modeling to produce factorized and disentangled topics in an interpretable manner. We compare our proposed CNTM with strong baseline models on widely-used metrics. Our model achieves the best topic coherence scores under the most general evaluation setting (100% proportion topic selected) with 25.0%, 10.9%, 24.6%, and 51.3% improvements above the second-best models’ scores reported on four datasets of 20 Newsgroups, Web Snippets, Tag My News, and Reuters, respectively. Our method also gets the second-best topic diversity scores on the dataset of 20Newsgroups and Web Snippets. Our experimental results show that CNTM can effectively leverage the disentanglement ability from contrastive learning to solve the inherent defect of neural topic modeling and obtain better topic quality.

Highlights

•

We propose the contrastive disentangled neural topic model based on topic embedding.

•

We use an explainable way to introduce contrastive learning to neural topic modeling.

•

We interpret the contrastive learning objective with MI maximization theory.

•

Experimental results demonstrate our proposed model’s disentanglement effectiveness.

References

[1]

Aletras N., Stevenson M., Evaluating topic coherence using distributional semantics, in: Proceedings of the 10th international conference on computational semantics (IWCS 2013) – Long papers, Association for Computational Linguistics, Potsdam, Germany, 2013, pp. 13–22. Retrieved from https://aclanthology.org/W13-0102.

Abstract

Highlights

References

Cited By

Index Terms

Recommendations

Topic-driven reader comments summarization

Topic sentiment change analysis

Research on Multi-document Summarization Based on LDA Topic Model

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations