Abstract
In the process of developing a domain-specific Chinese-English machine translation system, the accuracy of Chinese word segmentation on large amounts of training text often decreases because of unknown words. The lack of domain-specific annotated corpus makes supervised learning approaches unable to adapt to a target domain. This problem results in many errors in translation knowledge extraction and therefore seriously lowers translation quality. To solve the domain adaptation problem, we implement Chinese word segmentation by exploring n-gram statistical features in large Chinese raw corpus and bilingually motivated Chinese word segmentation, respectively. Moreover, we propose a method of combining multiple Chinese word segmentation results based on linear model to augment domain adaptation. For evaluation, we conduct experiments of Chinese word segmentation and Chinese-English machine translation using the data of NTCIR-10 Chinese-English patent task. The experimental results showed that the proposed method achieves improvements in both F-measure of the Chinese word segmentation and BLEU score of the Chinese-English statistical machine translation system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zhang, M., Deng, Z., Che, W., et al.: Combining Statistical Model and Dictionary for Domain Adaption of Chinese Word Segmentation. Journal of Chinese Information Processing 26(2), 8–12 (2012)
Wang, Y., Kazama, J., Tsuruoka, Y., et al.: Improving Chinese word segmentation and pos tagging with semi-supervised methods using large auto-analyzed data. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 309–317 (2011)
Guo, Z., Zhang, Y., Su, C., Xu, J.: Exploration of N-gram Features for the Domain Adaptation of Chinese Word Segmentation. In: Zhou, M., Zhou, G., Zhao, D., Liu, Q., Zou, L. (eds.) NLPCC 2012. CCIS, vol. 333, pp. 121–131. Springer, Heidelberg (2012)
Ma, Y., Way, A.: Bilingually motivated domain-adapted word segmentation for statistical machine translation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 549–557. Association for Computational Linguistics (2009)
Xi, N., Li, B., et al.: A Chinese Word Segmentation for Statistical Machine translation. Journal of Chinese Information Processing 26(3), 54–58 (2012)
Ma, Y., Zhao, T.: Combining Multiple Chinese Word Segmentation Results for Statistical Machine Translation. Journal of Chinese Information Processing 1, 104–109 (2010)
Feng, H., Chen, K., Deng, X., et al.: Accessor variety criteria for Chinese word extraction. Computational Linguistics 30(1), 75–93 (2004)
Low, J.K., Ng, H.T., Guo, W.: A Maximum Entropy Approach to Chinese Word Segmentation. In: Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing (SIGHAN 2005), pp. 161–164 (2005)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C++. Cambridge University Press, Cambridge (2002)
Xia, F.: The segmentation guidelines for the Penn Chinese Treebank (3.0). Technical report, University of Pennsylvania (2000)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167. Association for Computational Linguistics (2003)
Papineni, K., Roukos, S., Ward, T., et al.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational linguistics, pp. 311–318 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Su, C., Zhang, Y., Guo, Z., Xu, J. (2013). Exploring Multiple Chinese Word Segmentation Results Based on Linear Model. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2013. Communications in Computer and Information Science, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41644-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-41644-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41643-9
Online ISBN: 978-3-642-41644-6
eBook Packages: Computer ScienceComputer Science (R0)