An Efficient Ranking-Centered Density-Based Document Clustering Method

Wathsala Anupama Mohotti¹⁹ &
Richi Nayak¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3636 Accesses

Abstract

Document clustering is a popular method for discovering useful information from text data. This paper proposes an innovative hybrid document clustering method based on the novel concepts of ranking, density and shared neighborhood. We utilize ranked documents generated from a search engine to effectively build a graph of shared relevant documents. The high density regions in the graph are processed to form initial clusters. The clustering decisions are further refined using the shared neighborhood information. Empirical analysis shows that the proposed method is able to produce accurate and efficient solution as compared to relevant benchmarking methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 67.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 84.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fine-grained document clustering via ranking and its application to social media analytics

Article 07 April 2018

An efficient document information retrieval using hybrid global search optimization algorithm with density based clustering technique

Article 08 February 2023

Semi-supervised Document Clustering via Loci

References

Anastasiu, D.C., Tagarelli, A., Karypis, G.: Document clustering: the next frontier. In: Aggarwal, C.C., Reddy, C.K. (eds.) Data Clustering: Algorithms and Applications, pp. 305–328 (2013)
Google Scholar
Zhao, W., He, Q., Ma, H., Shi, Z.: Effective semi-supervised document clustering via active learning with instance-level constraints. KAIS 30, 569–587 (2012)
Google Scholar
Tomašev, N., Radovanović, M., Mladenić, D., Ivanović, M.: Hubness-based clustering of high-dimensional data. In: Celebi, M.Emre (ed.) Partitional Clustering Algorithms, pp. 353–386. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-09259-1_11
Chapter Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
Google Scholar
Ertöz, L., Steinbach, M., Kumar, V.: Finding topics in collections of documents: a shared nearest neighbor approach. Clustering and Information Retrieval. Network Theory and Applications, vol. 11, pp. 83–103. Springer, Boston (2003). https://doi.org/10.1007/978-1-4613-0227-8_3
Chapter Google Scholar
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Trans. Comput. 100, 1025–1034 (1973)
Article Google Scholar
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SIAM, pp. 47–58. SIAM (2003)
Chapter Google Scholar
Sutanto, T., Nayak, R.: Semi-supervised document clustering via loci. In: Wang, J., Cellary, W., Wang, D., Wang, H., Chen, S.-C., Li, T. (eds.) WISE 2015. LNCS, vol. 9419, pp. 208–215. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26187-4_16
Chapter Google Scholar
Broder, A., Garcia-Pueyo, L., Josifovski, V., Vassilvitskii, S., Venkatesan, S.: Scalable k-means by ranked retrieval. In: 7th WSDM, pp. 233–242. ACM (2014)
Google Scholar
Fuhr, N., Lechtenfeld, M., Stein, B., Gollub, T.: The optimum clustering framework: implementing the cluster hypothesis. Inf. Retr. 15, 93–115 (2012)
Article Google Scholar
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: 17th WWW, pp. 387–396. ACM (2008)
Google Scholar
Jardine, N., van Rijsbergen, C.J.: The use of hierarchic clustering in information retrieval. Inf. Storage Retr. 7, 217–240 (1971)
Article Google Scholar
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.-Y.: Improving web search results using affinity graph. In: 28th ACM SIGIR, pp. 504–511. ACM (2005)
Google Scholar
Hou, J., Nayak, R.: The heterogeneous cluster ensemble method using hubness for clustering text documents. In: Lin, X., Manolopoulos, Y., Srivastava, D. (eds.) WISE 2013. LNCS, vol. 8180, pp. 102–110. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41230-1_9
Chapter Google Scholar
Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19, 2756–2779 (2007)
Article MathSciNet Google Scholar
Hajek, B.: Adaptive transmission strategies and routing in mobile radio networks. Urbana 51, 61801 (1983)
Google Scholar

Download references

Author information

Authors and Affiliations

Queensland University of Technology (QUT), Brisbane, Australia
Wathsala Anupama Mohotti & Richi Nayak

Authors

Wathsala Anupama Mohotti
View author publications
You can also search for this author in PubMed Google Scholar
Richi Nayak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wathsala Anupama Mohotti .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohotti, W.A., Nayak, R. (2018). An Efficient Ranking-Centered Density-Based Document Clustering Method. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-93040-4_35
Published: 17 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics