[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Dirichlet Process Mixture Model for Document Clustering with Feature Partition

Published: 01 August 2013 Publication History

Abstract

Finding the appropriate number of clusters to which documents should be partitioned is crucial in document clustering. In this paper, we propose a novel approach, namely DPMFP, to discover the latent cluster structure based on the DPM model without requiring the number of clusters as input. Document features are automatically partitioned into two groups, in particular, discriminative words and nondiscriminative words, and contribute differently to document clustering. A variational inference algorithm is investigated to infer the document collection structure as well as the partition of document words at the same time. Our experiments indicate that our proposed approach performs well on the synthetic data set as well as real data sets. The comparison between our approach and state-of-the-art document clustering approaches shows that our approach is robust and effective for document clustering.

Cited By

View all
  • (2023)Short text topic modelling using local and global word-context semantic correlationMultimedia Tools and Applications10.1007/s11042-023-14352-x82:17(26411-26433)Online publication date: 2-Feb-2023
  • (2022)Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysisArtificial Intelligence Review10.1007/s10462-022-10254-w56:6(5133-5260)Online publication date: 26-Oct-2022
  • (2021)Nonparametric method of topic identification using granularity concept and graph-based modelingNeural Computing and Applications10.1007/s00521-020-05662-435:2(1055-1075)Online publication date: 13-Jan-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 25, Issue 8
August 2013
241 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 August 2013

Author Tags

  1. Database management
  2. Dirichlet process mixture model
  3. clustering document clustering
  4. database applications-text mining
  5. feature partition
  6. pattern recognition

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Short text topic modelling using local and global word-context semantic correlationMultimedia Tools and Applications10.1007/s11042-023-14352-x82:17(26411-26433)Online publication date: 2-Feb-2023
  • (2022)Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysisArtificial Intelligence Review10.1007/s10462-022-10254-w56:6(5133-5260)Online publication date: 26-Oct-2022
  • (2021)Nonparametric method of topic identification using granularity concept and graph-based modelingNeural Computing and Applications10.1007/s00521-020-05662-435:2(1055-1075)Online publication date: 13-Jan-2021
  • (2020)AquilisProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34322054:4(1-28)Online publication date: 18-Dec-2020
  • (2020)Targeted aspects oriented topic modeling for short textsApplied Intelligence10.1007/s10489-020-01672-w50:8(2384-2399)Online publication date: 1-Aug-2020
  • (2019)X-DMMProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33014197(4197-4204)Online publication date: 27-Jan-2019
  • (2019)Heterogeneous-Length Text Topic Modeling for Reader-Aware Multi-Document SummarizationACM Transactions on Knowledge Discovery from Data10.1145/333303013:4(1-21)Online publication date: 8-Aug-2019
  • (2019)Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian ApproachAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-16142-2_6(68-80)Online publication date: 14-Apr-2019
  • (2018)Short text clustering based on Pitman-Yor process mixture modelApplied Intelligence10.1007/s10489-017-1055-448:7(1802-1812)Online publication date: 1-Jul-2018
  • (2017)Inferring Dynamic User Interests in Streams of Short Texts for User ClusteringACM Transactions on Information Systems10.1145/307260636:1(1-37)Online publication date: 17-Jul-2017
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media