[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3570991.3571059acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
short-paper

Short Text Clustering in Continuous Time Using Stacked Dirichlet-Hawkes Process with Inverse Cluster Frequency Prior

Published: 04 January 2023 Publication History

Abstract

Traditional models for short text clustering ignore the time information associated with the text documents. However, existing works have shown that temporal characteristics of streaming documents are significant features for clustering. In this paper we propose a stacked Dirichlet-Hawkes process with inverse cluster frequency prior as a simple but effective solution for the task of short text clustering using temporal features in continuous time. Based on the classical formulation of the Dirichlet-Hawkes process, our model provides an elegant, theoretically grounded and interpretable solution while performing at par with recent state of the art models in short text clustering.

References

[1]
Charu C Aggarwal. 2013. A Survey of Stream Clustering Algorithms.
[2]
Amr Ahmed and Eric Xing. 2008. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. In Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, 219–230.
[3]
Hesam Amoualian, Marianne Clausel, Eric Gaussier, and Massih-Reza Amini. 2016. Streaming-lda: A copula-based approach to modeling topic dependencies in document streams. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 695–704.
[4]
Peng Bao, Hua-Wei Shen, Xiaolong Jin, and Xue-Qi Cheng. 2015. Modeling and predicting popularity dynamics of microblogs using self-excited hawkes processes. In Proceedings of the 24th International Conference on World Wide Web. 9–10.
[5]
David M Blei and John D Lafferty. 2006. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning. 113–120.
[6]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993–1022.
[7]
Junyang Chen, Zhiguo Gong, and Weiwen Liu. 2019. A nonparametric model for online topic discovery with word embeddings. Information Sciences 504(2019), 32–47.
[8]
Junyang Chen, Zhiguo Gong, and Weiwen Liu. 2020. A Dirichlet process biterm-based mixture model for short text stream clustering. Applied Intelligence(2020), 1–11.
[9]
Wanying Ding, Yue Zhang, Chaomei Chen, and Xiaohua Hu. 2016. Semi-supervised Dirichlet-Hawkes process with applications of topic detection and tracking in Twitter. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 869–874.
[10]
Nan Du, Mehrdad Farajtabar, Amr Ahmed, Alexander J Smola, and Le Song. 2015. Dirichlet-hawkes processes with applications to clustering continuous-time document streams. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 219–228.
[11]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[12]
Alan G Hawkes. 1971. Spectra of some self-exciting and mutually exciting point processes. Biometrika 58, 1 (1971), 83–90.
[13]
Jay Kumar, Junming Shao, Salah Uddin, and Wazir Ali. 2020. An Online Semantic-enhanced Dirichlet Model for Short Text Stream Clustering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 766–776.
[14]
Shangsong Liang, Emine Yilmaz, and Evangelos Kanoulas. 2016. Dynamic clustering of streaming short documents. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 995–1004.
[15]
Alireza Rezaei Mahdiraji. 2009. Clustering data stream: A survey of algorithms. International Journal of Knowledge-based and Intelligent Engineering Systems 13, 2(2009), 39–44.
[16]
Charalampos Mavroforakis, Isabel Valera, and Manuel Gomez Rodriguez. 2016. Modeling the dynamics of online learning activity. arXiv preprint arXiv:1610.05775(2016).
[17]
Hai-Long Nguyen, Yew-Kwong Woon, and Wee-Keong Ng. 2015. A survey on data stream clustering and classification. Knowledge and information systems 45, 3 (2015), 535–569.
[18]
Md Rashadul Hasan Rakib, Norbert Zeh, and Evangelos Milios. 2020. Short Text Stream Clustering via Frequent Word Pairs and Reassignment of Outliers to Clusters. In Proceedings of the ACM Symposium on Document Engineering 2020. 1–4.
[19]
Yeon Seonwoo, Alice Oh, and Sungjoon Park. 2018. Hierarchical dirichlet gaussian marked hawkes process for narrative reconstruction in continuous time domain. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3316–3325.
[20]
Jonathan A Silva, Elaine R Faria, Rodrigo C Barros, Eduardo R Hruschka, André CPLF de Carvalho, and João Gama. 2013. Data stream clustering: A survey. ACM Computing Surveys (CSUR) 46, 1 (2013), 1–31.
[21]
Yu Wang, Eugene Agichtein, and Michele Benzi. 2012. TM-LDA: efficient online modeling of latent topic transitions in social media. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 123–131.
[22]
Xing Wei, Jimeng Sun, and Xuerui Wang. 2007. Dynamic Mixture Models for Multiple Time-Series. In Ijcai, Vol. 7. 2909–2914.
[23]
Hongteng Xu and Hongyuan Zha. 2017. A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Vol. 30. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2017/file/dd8eb9f23fbd362da0e3f4e70b878c16-Paper.pdf
[24]
Jianhua Yin, Daren Chao, Zhongkun Liu, Wei Zhang, Xiaohui Yu, and Jianyong Wang. 2018. Model-based clustering of short text streams. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2634–2642.
[25]
Jianhua Yin and Jianyong Wang. 2014. A dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 233–242.
[26]
Jianhua Yin and Jianyong Wang. 2016. A model-based approach for text clustering with outlier detection. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 625–636.
[27]
Jianhua Yin and Jianyong Wang. 2016. A text clustering algorithm using an online clustering scheme for initialization. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 1995–2004.
[28]
Yukun Zhao, Shangsong Liang, Zhaochun Ren, Jun Ma, Emine Yilmaz, and Maarten de Rijke. 2016. Explainable user clustering in short text streams. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 155–164.

Cited By

View all
  • (2023)Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future PerspectivesApplied Sciences10.3390/app1306391113:6(3911)Online publication date: 19-Mar-2023

Index Terms

  1. Short Text Clustering in Continuous Time Using Stacked Dirichlet-Hawkes Process with Inverse Cluster Frequency Prior

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)
      January 2023
      357 pages
      ISBN:9781450397971
      DOI:10.1145/3570991
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 January 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. short text clustering
      2. temporal clustering

      Qualifiers

      • Short-paper
      • Research
      • Refereed limited

      Conference

      CODS-COMAD 2023

      Acceptance Rates

      Overall Acceptance Rate 197 of 680 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)35
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 17 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future PerspectivesApplied Sciences10.3390/app1306391113:6(3911)Online publication date: 19-Mar-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media