A new similarity computing method based on concept similarity in Chinese text processing

Jing Peng^1,2,
DongQing Yang¹,
ShiWei Tang¹,
TengJiao Wang¹ &
…
Jun Gao¹

144 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Semantic Based Text Similarity Computation

An Approach to Semantic Text Similarity Computing

Improved sqrt-cosine similarity measurement

Article Open access 25 July 2017

References

Nirenburg S. Two approaches of matching in example-based machine translation. In: Proc the 4th International Conference on Theoretical and Methodological Issues in Machine Translation(TMI-93), Kyoto, 1993. 47–57
Li S J, Zhang J, Huang X, et al. Semantic computation in Chinese question-answering system. J Comput Sci Tech, 2002,17(6),933–939
Article MATH Google Scholar
Ristad E S, Yianilos P N. Learning string-edit distance. IEEE PAMI, 1998, 20(5): 522–532
Google Scholar
Chatterjee N. 2001. A statistical approach for similarity measurement between sentences for EBMT. In: Proceedings of Symposium on Translation Support Systems STRANS-2001. Kanpur: Indian Institute of Technology, 2001
Google Scholar
Corley C, Mihalcea R. Measuring the Semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Morristown. NJ: Assoc Comput Linguist, 2005, 13–18
Google Scholar
Dagan I, Glickman O, Magnini B. The PASCAL recognising textual entailment challenge. In: Proceedings of the PASCAL Workshop. Berlin: Springer-Verlag, 2006. 3944: 177–190
Google Scholar
Zhang Z, Otterbacher J, Radev D. Learning cross-document structural relations using boosting. In: Proceedings of the 12th International Conference on Information and Knowledge Management. New Orleans: ACM, 2003. 124–130
Google Scholar
Dagan I, Lee L, Pereira F. Similarity-based models of word concurrence probabilities. Mach Learn, Special Issue on Machine Learning and Natural Language, 1999, 43–69
Dolan W B, Quirk C, Brockett C. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics. Morristown: Assoc Comput Linguist, 2004. 350–356
Google Scholar
Budanitsky A, Hirst G. Semantic distance in wordnet: An experimental, application-oriented evaluation of five measures. In: Proceedings of the NAACL Workshop on Word-Net and Other Lexical Resources. Morristown: Assoc Comput Linguist, 2001
Google Scholar
Liu Q, Li S J. Word similarity computing based on How-net. In: Computational Linguistics and Chinese Language Processing. Taiwan: Assoc Comput Linguist Chin Lang Proc, 2002. 7(2): 59–76
Google Scholar
Fan Xinghua, Sun Maosong. A high performance two-class Chinese text categorization method. Chin J Comput, 2006, 29(1): 124–131
MathSciNet Google Scholar
Pan Qianhong, Wang Ju, Shi Zhongzhi. Text similarity computing based on attribute theory. Chin J Comput, 1999, 22(6): 651–655
Google Scholar
Xu Xiaoling, Peng Jing, Shi Baomei, et al. A New All-pairs Shortest Paths Algorithm Based on Edge List. Comput Eng Appl, 2005, 41(29): 88–90
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Jing Peng, DongQing Yang, ShiWei Tang, TengJiao Wang & Jun Gao
Department of Science and Technology, Chengdu Municipal Public Security, Bureau, Chengdu, 610017, China
Jing Peng

Authors

Jing Peng
View author publications
You can also search for this author in PubMed Google Scholar
DongQing Yang
View author publications
You can also search for this author in PubMed Google Scholar
ShiWei Tang
View author publications
You can also search for this author in PubMed Google Scholar
TengJiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Peng.

Additional information

Supported by the China Postdoctoral Science Foundation (Grant No. 20060400002), the Sichuan Youth Science and Technology Foundation of China (Grant No. 08JJ0109), the National Natural Science Foundation of China (Grant Nos.60473051, 60503037), the National High-tech Research and Development of China (Grant No. 2006AA01Z230) and the Natural Science Foundation of Beijing Natural Science Foundation (Grant No. 4062018)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, J., Yang, D., Tang, S. et al. A new similarity computing method based on concept similarity in Chinese text processing. Sci. China Ser. F-Inf. Sci. 51, 1215–1230 (2008). https://doi.org/10.1007/s11432-008-0103-4

Download citation

Received: 28 August 2007
Accepted: 11 January 2008
Published: 07 August 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s11432-008-0103-4

A new similarity computing method based on concept similarity in Chinese text processing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Based Text Similarity Computation

An Approach to Semantic Text Similarity Computing

Improved sqrt-cosine similarity measurement

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A new similarity computing method based on concept similarity in Chinese text processing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Based Text Similarity Computation

An Approach to Semantic Text Similarity Computing

Improved sqrt-cosine similarity measurement

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation