[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

RETRACTED ARTICLE: A swarm-optimized tree-based association rule approach for classifying semi-structured data using soft computing approach

Published: 01 October 2021 Publication History

Abstract

The semantic and XML in document classification are used to develop XML data based on tree-based document classification method. The document classification plays the main role in the information management and its retrieval of data, which is a learning problem. In a development context, document classification has a major role in many applications, especially in classifying, organizing, searching and representing concisely large information volumes. A swarm-optimized tree-based association rule approach is presented for the classification of semi-structured data with the use of soft computing. To improve document classification, a tree pruning technique to prune weak and infrequent rules and a binary particle swarm optimization (BPSO) method to optimize tree construction are proposed. An optimized tree-based association rule was proposed to improve XML documents classification based on BPSO, and tree pruning technique to prune weak/infrequent rules is presented. The method was evaluated by Reuters dataset. The Reuters dataset is applied for this method. Results show that the new method performs well for precision and recall compared with current methods.

References

[1]
Bächle S, Härder T, and Haustein MP Zhou X, Yokota H, Deng K, and Liu Q Implementing and optimizing fine-granular lock management for XML document trees Database systems for advanced applications. DASFAA 2009. Lecture Notes in Computer Science 2009 Heidelberg Berlin 631-645
[2]
Barbara R (2000) Latent semantic indexing: an overview
[3]
Bekkerman R, Gavish M (2011) High-precision phrase-based document classification on a modern scale. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 231–239
[4]
Buja A, Lee YS (2001) Data mining criteria for tree-based regression and classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 27–36
[5]
Carvalho DR and Freitas AA A hybrid decision tree/genetic algorithm method for data mining Inf Sci 2004 163 1 13-35
[6]
Chagheri S, Dumoulin C (2009) Semantic indexing of technical documentation. Laboratoire d'InfoRmatique en Image et Systèmes d'information. https://liris.cnrs.fr/en/thesis/thesis-samanehchagheri
[7]
De Vries CM, Geva S (2009) Document clustering with K-tree. International workshop of the initiative for the evaluation of XML retrieval INEX 2008: advances in focused retrieval. pp 420–431
[8]
Gabrilovich E and Markovitch S Wikipedia-based semantic interpretation for natural language processing J Artif Intell Res 2009 34 443-498
[9]
Giunchiglia F, Dutta B, and Maltese V Borgida AT, Chaudhri VK, Giorgini P, and Yu ES Faceted lightweight ontologies Conceptual modeling: foundations and applications. Lecture notes in computer science 2009 Heidelberg Berlin 36-51
[10]
Hofmann K, Tsagkias M, Meij E, De Rijke M (2009) The impact of document structure on keyphrase extraction. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, pp 1725–1728
[11]
Hu X, Zhang X, Lu C, Park EK, Zhou X (2009) Exploiting Wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 389–396
[12]
Huang A, Milne D, Frank E, and Witten IH Theeramunkong T, Kijsirikul B, Cercone N, and Ho TB Clustering documents using a Wikipedia-based concept representation Advances in knowledge discovery and data mining. PAKDD 2009. Lecture notes in computer science 2009 Heidelberg Berlin 628-636
[13]
Khanesar MA, Teshnehlab M, Shoorehdeli MA (2007) A novel binary particle swarm optimization. In: Mediterranean conference on control and automation, 2007. MED’07. IEEE, pp 1–6
[14]
Lan L, Qiao-Mei R (2009) Research of web mining technology based on XML. In: International conference on networks security, wireless communications and trusted computing, 2009. NSWCTC’09, vol 2. IEEE, pp 653–656
[15]
Li J, Shen H, and Topor R Mining the optimal class association rule set Knowl Based Syst 2002 15 7 399-405
[16]
Marks G, Roantree M, Murphy J (2010) Classification of index partitions to boost XML query performance. In: Conceptual modeling—ER 2010. Springer, Berlin, pp 405–418
[17]
Michalopoulos D, Mavridis I (2011) Utilizing document classification for grooming attack recognition. In: 2011 IEEE Symposium on computers and communications (ISCC). IEEE, pp 864–869
[18]
Nyberg K, Raiko T, Tiinanen T, Hyvönen E (2010) Document classification utilising ontologies and relations between documents. In: Proceedings of the eighth workshop on mining and learning with graphs. ACM, pp 86–93
[19]
Phan XH, Nguyen CT, Le DT, Nguyen LM, Horiguchi S, and Ha QT A hidden topic-based framework toward building applications with short web documents IEEE Trans Knowl Data Eng 2011 23 7 961-976
[20]
Power R, Chen J, Kuppusamy TK, Subramanian L (2010) Document classification for focused topics. In: AAAI Spring symposium: artificial intelligence for development
[21]
Salles T, Rocha L, Pappa GL, Mourão F, Meira W Jr, Gonçalves M (2010) Temporally-aware algorithms for document classification. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 307–314
[22]
Salton G Automatic text processing: the transformation, analysis, and retrieval of information by computer 1989 Boston Addison-Wesley Longman Publishing Co.
[23]
Savoy J A stemming procedure and stop word list for general French corpora J Am Soc Inf Sci 1999 50 10 944-952
[24]
Sokolova M and Lapalme G A systematic analysis of performance measures for classification tasks Inf Process Manag 2009 45 4 427-437
[25]
Vila M, Bardera A, Feixas M, and Sbert M Tsallis mutual information for document classification Entropy 2011 13 9 1694-1707
[26]
Yessenalina A, Yue Y, Cardie C (2010) Multi-level structured models for document-level sentiment classification. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1046–1056

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Soft Computing - A Fusion of Foundations, Methodologies and Applications
Soft Computing - A Fusion of Foundations, Methodologies and Applications  Volume 25, Issue 20
Oct 2021
440 pages
ISSN:1432-7643
EISSN:1433-7479
Issue’s Table of Contents

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 October 2021
Accepted: 02 April 2021

Author Tags

  1. Tree-based association rule (TAR)
  2. Document classification
  3. Binary particle swarm optimization (BPSO)
  4. Semi-structured document

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media