[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2911451.2914729acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data

Published: 07 July 2016 Publication History

Abstract

Scholars often seek to understand topics discussed on Twitter using topic modelling approaches. Several coherence metrics have been proposed for evaluating the coherence of the topics generated by these approaches, including the pre-calculated Pointwise Mutual Information (PMI) of word pairs and the Latent Semantic Analysis (LSA) word representation vectors. As Twitter data contains abbreviations and a number of peculiarities (e.g. hashtags), it can be challenging to train effective PMI data or LSA word representation. Recently, Word Embedding (WE) has emerged as a particularly effective approach for capturing the similarity among words. Hence, in this paper, we propose new Word Embedding-based topic coherence metrics. To determine the usefulness of these new metrics, we compare them with the previous PMI/LSA-based metrics. We also conduct a large-scale crowdsourced user study to determine whether the new Word Embedding-based metrics better align with human preferences. Using two Twitter datasets, our results show that the WE-based metrics can capture the coherence of topics in tweets more robustly and efficiently than the PMI/LSA-based ones.

References

[1]
Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3, 2003.
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003.
[3]
A. Fang, C. Macdonald, I. Ounis, and P. Habel. Topics in tweets: A user study of topic coherence metrics for twitter data. In Proc. of ECIR, 2016.
[4]
A. Fang, I. Ounis, P. Habel, C. Macdonald, and N. Limsopatham. Topic-centric classification of twitter user's political orientation. In Proc. of SIGIR, 2015.
[5]
T. L. Griffiths and M. Steyvers. Finding scientific topics. In Proc. of NAS, 2004.
[6]
R. Lebret and R. Collobert. N-gram-based low-dimensional representation for document classification. In Proc. of ICLP, 2015.
[7]
W. Li and A. McCallum. Pachinko allocation: DAG-structured mixture models of topic correlations. In Proc. of ICML, 2006.
[8]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. efficient estimation of word representations in vector space. In Proc. of ICLR workshop, 2013.
[9]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Proc. of NIPS, 2013.
[10]
T. Mikolov, W. Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In Proc. of HLT-NAACL, 2013.
[11]
D. Newman, J. H. Lau, K. Grieser, and T. Baldwin. Automatic evaluation of topic coherence. In Proc. of NAACL, 2010.
[12]
J. Pennington, R. Socher, and C. D. Manning. GloVe: Global vectors for word representation. In Proc. of EMNLP, 2014.
[13]
G. Recchia and M. N. Jones. More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis. Behavior research methods, 41:647--656, 2009.
[14]
M. Steyvers and T. Griffiths. Probabilistic topic models. Handbook of latent semantic analysis, 427:424--440, 2007.
[15]
W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing Twitter and traditional media using topic models. In Proc. of ECIR, 2011.
[16]
W. Y. Zou, R. Socher, D. M. Cer, and C. D. Manning. Bilingual word embeddings for phrase-based machine translation. In Proc. of EMNLP, 2013.

Cited By

View all
  • (2024)Long COVID Discourse in Canada, the United States, and Europe: Topic Modeling and Sentiment Analysis of Twitter DataJournal of Medical Internet Research10.2196/5942526(e59425)Online publication date: 9-Dec-2024
  • (2024)Word Embedding-Based Text Complexity AnalysisWisdom, Well-Being, Win-Win10.1007/978-3-031-57867-0_21(283-292)Online publication date: 10-Apr-2024
  • (2023)Marketing Insights from Reviews Using Topic Modeling with BERTopic and Deep Clustering NetworkApplied Sciences10.3390/app1316944313:16(9443)Online publication date: 21-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. coherence metrics
  2. lda
  3. topic models
  4. twitter
  5. twitter lda
  6. word embeddings

Qualifiers

  • Short-paper

Conference

SIGIR '16
Sponsor:

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Long COVID Discourse in Canada, the United States, and Europe: Topic Modeling and Sentiment Analysis of Twitter DataJournal of Medical Internet Research10.2196/5942526(e59425)Online publication date: 9-Dec-2024
  • (2024)Word Embedding-Based Text Complexity AnalysisWisdom, Well-Being, Win-Win10.1007/978-3-031-57867-0_21(283-292)Online publication date: 10-Apr-2024
  • (2023)Marketing Insights from Reviews Using Topic Modeling with BERTopic and Deep Clustering NetworkApplied Sciences10.3390/app1316944313:16(9443)Online publication date: 21-Aug-2023
  • (2023)Topics in the Haystack: Enhancing Topic Quality through Corpus ExpansionComputational Linguistics10.1162/coli_a_0050650:2(619-655)Online publication date: 1-Jun-2023
  • (2023)Research on Topic Identification of Safety Hazard Information in Oilfield EnterprisesProceedings of the 2023 12th International Conference on Computing and Pattern Recognition10.1145/3633637.3633680(271-278)Online publication date: 27-Oct-2023
  • (2023)A Review of Stability in Topic Modeling: Metrics for Assessing and Techniques for Improving StabilityACM Computing Surveys10.1145/362326956:5(1-32)Online publication date: 27-Nov-2023
  • (2023)Neural Personalized Topic Modeling for Mining User Preferences on Social MediaProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614987(1545-1555)Online publication date: 21-Oct-2023
  • (2023)Topic Modeling-Based Framework for Extracting Marketing Information From E-Commerce ReviewsIEEE Access10.1109/ACCESS.2023.333780811(135049-135060)Online publication date: 2023
  • (2022)Topic modeling revisited:  New evidence on algorithm performance and quality metricsPLOS ONE10.1371/journal.pone.026632517:4(e0266325)Online publication date: 28-Apr-2022
  • (2022)Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892602(1-7)Online publication date: 18-Jul-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media