[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A new similarity computing method based on concept similarity in Chinese text processing

  • Published:
Science in China Series F: Information Sciences Aims and scope Submit manuscript

Abstract

The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Nirenburg S. Two approaches of matching in example-based machine translation. In: Proc the 4th International Conference on Theoretical and Methodological Issues in Machine Translation(TMI-93), Kyoto, 1993. 47–57

  2. Li S J, Zhang J, Huang X, et al. Semantic computation in Chinese question-answering system. J Comput Sci Tech, 2002,17(6),933–939

    Article  MATH  Google Scholar 

  3. Ristad E S, Yianilos P N. Learning string-edit distance. IEEE PAMI, 1998, 20(5): 522–532

    Google Scholar 

  4. Chatterjee N. 2001. A statistical approach for similarity measurement between sentences for EBMT. In: Proceedings of Symposium on Translation Support Systems STRANS-2001. Kanpur: Indian Institute of Technology, 2001

    Google Scholar 

  5. Corley C, Mihalcea R. Measuring the Semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Morristown. NJ: Assoc Comput Linguist, 2005, 13–18

    Google Scholar 

  6. Dagan I, Glickman O, Magnini B. The PASCAL recognising textual entailment challenge. In: Proceedings of the PASCAL Workshop. Berlin: Springer-Verlag, 2006. 3944: 177–190

    Google Scholar 

  7. Zhang Z, Otterbacher J, Radev D. Learning cross-document structural relations using boosting. In: Proceedings of the 12th International Conference on Information and Knowledge Management. New Orleans: ACM, 2003. 124–130

    Google Scholar 

  8. Dagan I, Lee L, Pereira F. Similarity-based models of word concurrence probabilities. Mach Learn, Special Issue on Machine Learning and Natural Language, 1999, 43–69

  9. Dolan W B, Quirk C, Brockett C. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics. Morristown: Assoc Comput Linguist, 2004. 350–356

    Google Scholar 

  10. Budanitsky A, Hirst G. Semantic distance in wordnet: An experimental, application-oriented evaluation of five measures. In: Proceedings of the NAACL Workshop on Word-Net and Other Lexical Resources. Morristown: Assoc Comput Linguist, 2001

    Google Scholar 

  11. Liu Q, Li S J. Word similarity computing based on How-net. In: Computational Linguistics and Chinese Language Processing. Taiwan: Assoc Comput Linguist Chin Lang Proc, 2002. 7(2): 59–76

    Google Scholar 

  12. Fan Xinghua, Sun Maosong. A high performance two-class Chinese text categorization method. Chin J Comput, 2006, 29(1): 124–131

    MathSciNet  Google Scholar 

  13. Pan Qianhong, Wang Ju, Shi Zhongzhi. Text similarity computing based on attribute theory. Chin J Comput, 1999, 22(6): 651–655

    Google Scholar 

  14. Xu Xiaoling, Peng Jing, Shi Baomei, et al. A New All-pairs Shortest Paths Algorithm Based on Edge List. Comput Eng Appl, 2005, 41(29): 88–90

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Peng.

Additional information

Supported by the China Postdoctoral Science Foundation (Grant No. 20060400002), the Sichuan Youth Science and Technology Foundation of China (Grant No. 08JJ0109), the National Natural Science Foundation of China (Grant Nos.60473051, 60503037), the National High-tech Research and Development of China (Grant No. 2006AA01Z230) and the Natural Science Foundation of Beijing Natural Science Foundation (Grant No. 4062018)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, J., Yang, D., Tang, S. et al. A new similarity computing method based on concept similarity in Chinese text processing. Sci. China Ser. F-Inf. Sci. 51, 1215–1230 (2008). https://doi.org/10.1007/s11432-008-0103-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-008-0103-4

Keywords

Navigation