Abstract
Growth in number of documents increases day by day, and for managing this growth the document clustering techniques are used document clustering is a significant tool to allocating web search engines for data mining and knowledge discovery. In this paper, we have introduced a new framework graph-based frequent Term set for document clustering (GBFTDC). In this study, document clustering has been performed for extraction of useful information from document dataset based on frequent term set. We have generated association rules to perform pre-processing and then have applied clustering approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kongthon, A.: A text mining framework for discovering technological intelligence to support science and technology management. Technical Report, Georgia Institute of Technology (2004)
Kalogeratos, A., Likas, A.: Document clustering using synthetic cluster prototypes. Data Knowl. Eng. 70, 284–306 (2011)
Fung, B., Wang, K., Ester, M.: Hierarchical document clustering using frequent itemsets. In: Proceeding of SIAM International Conference on Data Mining (SDM’03), pp. 59–70 (2003)
Michenerand, C.D., Sokal, R.R.: A quantitative approach to a problem in classification. Evolution 11, 130–162 (1957)
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0 : Step-by-step data mining guide, NCR Systems Engineering Copenhagen (USA), DaimlerChrysler AG, SPSS Inc. (USA) and OHRA Verzekeringenen Bank Group B.V ( Netherlands), (2000)
Chen, C.L., Frank, S.C.T., Liang, T.: An integration of wordnet and fuzzy association rule mining for multi-label document clustering. Data Knowl. Eng. 69, 1208–1226 (2010)
Chen, C.L., Tseng, F.S.C., Liang, T.: An integration of fuzzy association rules and WordNet for document clustering. In: Proceeding of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-09), pp. 147–159 (2009)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based approach to browsing large document collections. In: Proceedings of the Fifteenth Annual International ACM SIGIR Conference, pp. 318–329, June 1992
Recupero, D.R.: A new unsupervised method for document clustering by using WordNet lexical and conceptual relations. Inf. Retrieval 10(6), 563–579 (2007)
Rajput, D.S., Thakur, R.S., Thakur, G.S.: Rule generation from textual data by using graph based approach. In: International Journal of Computer Application (IJCA) 0975–8887, New york, ISBN: 978-93-80865-11-8, Vol. 31, No.9, pp. 36–43, Oct 2011
Dunham, M.H., Sridhar, S.: Data mining: introductory and advanced topics. Pearson Education, New Delhi, ISBN: 81-7758-785-4, 1st edn. (2006)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Magazine, American Association for Artificial Intelligence (1996)
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceeding of International Conference on knowledge Discovery and Data Mining (KDD’02), pp. 436–442 (2002)
Fung, B.C.M., Wang, K., Ester, M.: Hierarchical document clustering using frequent itemsets. In: Proceedings of SIAM International Conference on Data Mining (2003)
Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16, 1279–1296 (2004)
Han, I., Kamber, M.: Data Mining Concepts and Techniques, pp. 335–389. M. K. Publishers, Berlin (2000)
Haralampos, K., Christos, T., Babis, T.: An approach to text mining using information extraction. In: Proceeding Knowledge Management Theory Applications Workshop, (KMTA 2000), pp. 165–178. Lyon, Sept 2000
Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Appl. Stat. 28, 126–130(1979)
Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Proceeding of SIGIR International Conference on Semantic Web, Workshop, (2003)
Hung, C., Xiaotie, D.: Efficient phrase-based document similarity for clustering. IEEE Trans. Knowl. Data Eng. 20, 1217–1229 (Sept 2008)
Introduction to Data Mining and Knowledge Discovery, 3rd edn. ISBN: 1-892095-02-5, Two Crows Corporation, 10500 Falls Road, Potomac, MD 20854, U.S.A., (1999)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Jensen, C.S.: Introduction to Temporal Database Research. http://www.cs.aau.dk/csj/Thesis/pdf/chapter1.pdf
Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11(1, 2), 22–31, June 1968
Kiran, G.V.R., Ravi Shankar, Vikram Pudi: Frequent itemset based hierarchical document clustering using wikipedia as external knowledge. KES 2010, Part II, LNAI 6277, pp. 11–20. Springer, Berlin (2010)
Lin, K., Kondadadi, R.: A word-based soft clustering algorithm for documents. In: Proceedings of Computers and Their Applications, pp. 391–394. Seattle (2001)
Larose, D.T.: Discovering knowledge in data: an introduction to data mining, Wiley, Inc., 2005. International Journal of Distributed and Parallel systems (IJDPS) Vol. 1, No. 1, (2010)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. KDD-2000 Workshop on Text Mining, pp. 109–110 (2000)
Rafi, Muhammad, Shahid Shaikh, M., Farooq, Amir: Document clustering based on topic maps. Int. J. Comput. Appl. 12(1), 32–36 (2010)
Nasukawa, T., Nagano, T.: Text analysis and knowledge mining system. IBM Syst. J. 40(4), 967–984 (2001)
Willett, P.: Recent trends in hierarchic document clustering: a critical review. Inf. Process. Manage. 24(5), 577–597 (1988)
Lin, K., Kondadadi, R.: A word-based soft clustering algorithm for documents. In: Proceeding Computers and Their Applications, pp. 391–394 (2001)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)
Richards, A.L., Holmans, P., O’Donovan, M.C., Owen, M.J., Jones, L.: A comparison of four clustering methods for brain expression microarray data. BMC Bioinform. 9, pp. 1–17 (2008)
Thakur, R.S., Jain, R.C., Pardasani, K.R.: Graph theoretic based algorithm for mining frequent patterns. In: IEEE World Congress on Computational Intelligence, pp. 629–633. Hong Kong (2008)
Thakur, R.S., Jain, R.C., Pardasani, K.R.: Fast algorithms for mining multi-level association rules in large databases. Asian J. Inf. Manage. USA 1(1), 19–26 (2008)
Thakur, R.S., Jain, R.C., Pardasani, K.R.: MAXFP: a multi-strategy algorithm for mining maximum frequent pattern and their support counts. Trends Appl. Sci. Res. 1(4), 402–415 (2006)
Vishnu Priya, R., Vadivel, A., Thakur, R.S.: Frequent pattern mining using modified CP-Tree for knowledge discovery. Advanced Data Mining and Applications, LNCS-2010, Vol. 6440, pp. 254–261. Springer, Berlin (2010)
Soon, M.C., John, D.H., Yanjun, L.: Text document clustering based on frequent word meaning sequences. Data Knowl. Eng. 64, 381–404 (2008)
Valentina, C., Sylvie, D.: Text mining supported terminology construction. In: Proceedings of the 5th International Conference on Knowledge Management, pp. 588–595. Graz, Austria (2005)
Acknowledgments
This work is supported by research grant from MANIT, Bhopal, India under Grants in Aid Scheme 2010-11, No. Dean(R&C)/2010/63 dated 31/08/2010.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer India
About this paper
Cite this paper
Rajput, D.S., Thakur, R.S., Thakur, G.S. (2014). An Integrated Approach and Framework for Document Clustering Using Graph Based Association Rule Mining. In: Babu, B., et al. Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. Advances in Intelligent Systems and Computing, vol 236. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1602-5_144
Download citation
DOI: https://doi.org/10.1007/978-81-322-1602-5_144
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1601-8
Online ISBN: 978-81-322-1602-5
eBook Packages: EngineeringEngineering (R0)