A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM

Boli Wang¹⁶,
Xiaodong Shi^16,17,18,
Zhixing Tan¹⁶,
Yidong Chen¹⁶ &
…
Weili Wang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10085))

Included in the following conference series:

Workshop on Chinese Lexical Semantics

1718 Accesses
2 Citations

Abstract

Most of ancient Chinese texts have no punctuations or segmentation of sentences. Recent researches on automatic ancient Chinese sentence segmentation usually resorted to sequence labelling models and utilized small data sets. In this paper, we propose a sentence segmentation method for ancient Chinese texts based on neural network language models. Experiments on large-scale corpora indicate that our method is effective and achieves a comparable result to the traditional CRF model. Implementing sentence length penalty, using larger Simplified Chinese corpora, or dividing corpora by ages can further improve performance of our model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Word Segmentation Method of Ancient Chinese Based on Word Alignment

Is Local Window Essential for Neural Network Based Chinese Word Segmentation?

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation

References

Zhang, H., Wang, X., Yang, J., Zhou, W.: Method of sentence segmentation and punctuating for ancient Chinese literatures based on cascaded CRF. Application Research of Computers 26(9), 3326–3329 (2009). (in Chinese)
Google Scholar
Zhang, K., Xia, Y., Hang, Y.U.: CRF-based approach to sentence segmentation and punctuation for ancient Chinese prose. Journal of Tsinghua University 49(10), 1733–1736 (2009). (in Chinese)
Google Scholar
Huang, H.H., Sun, C.T., Chen, H.H.: Classical Chinese sentence segmentation. In: Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing (2010)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML (2001)
Google Scholar
Chen, T., Chen, R., Pan, L., Li, H., Yu, Z.: Archaic Chinese punctuating sentences based on context N-gram model. Computer Engineering 33(3), 192–193 (2007). (in Chinese)
Google Scholar
Huang, J., Hou, H.: On sentence segmentation and punctuation model for ancient books on agriculture. Journal of Chinese Information Processing 22(4), 31–38 (2008). (in Chinese)
Google Scholar
Hinton, G.E.: Learning distributed representations of concepts. In: Proceedings of CogSci (1986)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. In: Proceedings of NIPS (2001)
Google Scholar
Mikolov, T., Karafiat, M., Burget, L., Cernockk, J.H., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of Interspeech (2010)
Google Scholar
Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

Download references

Author information

Authors and Affiliations

Department of Cognitive Science, Xiamen University, Xiamen, 361005, China
Boli Wang, Xiaodong Shi, Zhixing Tan, Yidong Chen & Weili Wang
Collaborative Innovation Center for Peaceful Development of Cross-Strait Relations, Xiamen University, Xiamen, 361005, China
Xiaodong Shi
Fujian Province Key Laboratory for Brain-inspired Computing, Xiamen University, Xiamen, 361005, China
Xiaodong Shi

Authors

Boli Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zhixing Tan
View author publications
You can also search for this author in PubMed Google Scholar
Yidong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weili Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Shi .

Editor information

Editors and Affiliations

Institute for Infocomm Research , Singapore, Singapore
Minghui Dong
Nanyang Technological University , Singapore, Singapore
Jingxia Lin
Huazhong University of Science and Technology, Wuhan, China
Xuri Tang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, B., Shi, X., Tan, Z., Chen, Y., Wang, W. (2016). A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM. In: Dong, M., Lin, J., Tang, X. (eds) Chinese Lexical Semantics. CLSW 2016. Lecture Notes in Computer Science(), vol 10085. Springer, Cham. https://doi.org/10.1007/978-3-319-49508-8_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-49508-8_36
Published: 27 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49507-1
Online ISBN: 978-3-319-49508-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Word Segmentation Method of Ancient Chinese Based on Word Alignment

Is Local Window Essential for Neural Network Based Chinese Word Segmentation?

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Word Segmentation Method of Ancient Chinese Based on Word Alignment

Is Local Window Essential for Neural Network Based Chinese Word Segmentation?

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation