[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1281192.1281246acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Automatic labeling of multinomial topic models

Published: 12 August 2007 Publication History

Abstract

Multinomial distributions over words are frequently used to model topics in text collections. A common, major challenge in applying all such topic models to any text mining problem is to label a multinomial topic model accurately so that a user can interpret the discovered topic. So far, such labels have been generated manually in a subjective way. In this paper, we propose probabilistic approaches to automatically labeling multinomial topic models in an objective way. We cast this labeling problem as an optimization problem involving minimizing Kullback-Leibler divergence between word distributions and maximizing mutual information between a label and a topic model. Experiments with user study have been done on two text data sets with different genres.The results show that the proposed labeling methods are quite effective to generate labels that are meaningful and useful for interpreting the discovered topic models. Our methods are general and can be applied to labeling topics learned through all kinds of topic models such as PLSA, LDA, and their variations.

References

[1]
S. Banerjee and T. Pedersen. The design, implementation, and use of the ngram statistics package. pages 370--381, 2003.
[2]
D. Blei and J. Lafferty. Correlated topic models. In NIPS '05: Advances in Neural Information Processing Systems 18, 2005.
[3]
D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, pages 113--120, 2006.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[5]
J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR '98, pages 335--336, 1998.
[6]
J. Chen, J. Yan, B. Zhang, Q. Yang, and Z. Chen. Diverse topic phrase extraction through latent semantic analysis. In Proceedings of ICDM '06, pages 834--838, 2006.
[7]
K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1): 22--29, 1990.
[8]
W. B. Croft and J. Lafferty, editors. Language Modeling and Information Retrieval. Kluwer Academic Publishers, 2003.
[9]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences 101(suppl.1): 5228--5235, 2004.
[10]
J. Hammerton, M. Osborne, S. Armstrong, and W. Daelemans. Introduction to special issue on machine learning approaches to shallow parsing. J. Mach. Learn. Res., 2:551--558, 2002.
[11]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of ACM SIGIR'99, pages 50--57.
[12]
R. Jin and A. G. Hauptmann. A new probabilistic model for title generation. In Proceedings of the 19th international conference on Computational linguistics, pages 1--7, 2002.
[13]
P. J. Kaufman, Leonard; Rousseeuw. Finding groups in data. an introduction to cluster analysis. Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics. Wiley. New York., 1990.
[14]
W. Li and A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations. In ICML '06: Proceedings of the 23rd international conference on Machine learning, pages 577--584, 2006.
[15]
C. D. Manning and H. Schtze. Foundations of statistical natural language processing. MIT Press, Cambridge, MA, USA, 1999.
[16]
Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW '06, pages 533--542, 2006.
[17]
Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceeding of KDD'05, pages 198--207, 2005.
[18]
Q. Mei and C. Zhai. A mixture model for contextual text mining. In Proceedings of KDD '06, pages 649--655, 2006.
[19]
D. Newman, C. Chemudugunta, and P. Smyth. Statistical entity-topic models. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 680--686, 2006.
[20]
P. Pantel and D. Lin. Discovering word senses from text. In Proceedings of KDD '02, pages 613--619, 2002.
[21]
D. R. Radev, E. Hovy, and K. McKeown. Introduction to the special issue on summarization. Comput. Linguist., 28(4): 399--408, 2002.
[22]
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proceedings of KDD'04, pages 306--315, 2004.
[23]
X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In Proceedings of KDD '06, pages 424--433, 2006.
[24]
X. Wei and W. B. Croft. Lda--based document models for ad-hoc retrieval. In Proceedings of SIGIR '06, pages 178-185, 2006.
[25]
C. Zhai. Fast statistical parsing of noun phrases for document indexing. In Proceedings of the fifth conference on Applied natural language processing, pages 312--319, 1997.
[26]
C. Zhai and J. Lafferty. Model--based feedback in the language modeling approach to information retrieval. In Proceedings of CIKM '01, pages 403--410, 2001.
[27]
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of KDD'04, pages 743--748, 2004.

Cited By

View all
  • (2024)Topic Modelling: Going beyond Token OutputsBig Data and Cognitive Computing10.3390/bdcc80500448:5(44)Online publication date: 25-Apr-2024
  • (2024)Vision AI System Development for Improved Productivity in Challenging Industrial Environments: A Sustainable and Efficient ApproachApplied Sciences10.3390/app1407275014:7(2750)Online publication date: 25-Mar-2024
  • (2024)TopicTag: Automatic Annotation of NMF Topic Models Using Chain of Thought and Prompt Tuning with LLMsProceedings of the ACM Symposium on Document Engineering 202410.1145/3685650.3685667(1-4)Online publication date: 20-Aug-2024
  • Show More Cited By

Index Terms

  1. Automatic labeling of multinomial topic models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2007
    1080 pages
    ISBN:9781595936097
    DOI:10.1145/1281192
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. multinomial distribution
    2. statistical topic models
    3. topic model labeling

    Qualifiers

    • Article

    Conference

    KDD07

    Acceptance Rates

    KDD '07 Paper Acceptance Rate 111 of 573 submissions, 19%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)94
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Topic Modelling: Going beyond Token OutputsBig Data and Cognitive Computing10.3390/bdcc80500448:5(44)Online publication date: 25-Apr-2024
    • (2024)Vision AI System Development for Improved Productivity in Challenging Industrial Environments: A Sustainable and Efficient ApproachApplied Sciences10.3390/app1407275014:7(2750)Online publication date: 25-Mar-2024
    • (2024)TopicTag: Automatic Annotation of NMF Topic Models Using Chain of Thought and Prompt Tuning with LLMsProceedings of the ACM Symposium on Document Engineering 202410.1145/3685650.3685667(1-4)Online publication date: 20-Aug-2024
    • (2024)A Hybrid Neural Network Model for Sentiment Analysis of Financial Texts Using Topic Extraction, Pre-Trained Model, and Enhanced Attention Mechanism MethodsIEEE Access10.1109/ACCESS.2024.342915012(98207-98224)Online publication date: 2024
    • (2024)Finding Long-COVID: temporal topic modeling of electronic health records from the N3C and RECOVER programsnpj Digital Medicine10.1038/s41746-024-01286-37:1Online publication date: 21-Oct-2024
    • (2024)Top2LabelExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122676242:COnline publication date: 16-May-2024
    • (2024)Dynamic topic modelling for exploring the scientific literature on coronavirus: an unsupervised labelling techniqueInternational Journal of Data Science and Analytics10.1007/s41060-024-00610-0Online publication date: 13-Aug-2024
    • (2024)Topic Label Generation in the Popular Science CorpusDigital Geography10.1007/978-3-031-67762-5_9(107-121)Online publication date: 9-Nov-2024
    • (2023)Solution to Social Problems Using Topic Modeling: From the Perspective of Social Polarization and Multiple DisparitiesJournal of Digital Contents Society10.9728/dcs.2023.24.8.174124:8(1741-1751)Online publication date: 31-Aug-2023
    • (2023)Revolution trend investigation of tourism destination image with machine learningJournal of Vacation Marketing10.1177/13567667231213152Online publication date: 9-Nov-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media