[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Information and Media Technologies
Online ISSN : 1881-0896
ISSN-L : 1881-0896
Information Systems and Applications
Improving Vietnamese Word Segmentation and POS Tagging using MEM with Various Kinds of Resources
Oanh Thi TranCuong Anh LeThuy Quang Ha
Author information
JOURNAL FREE ACCESS

2010 Volume 5 Issue 2 Pages 890-909

Details
Abstract

Word segmentation and POS tagging are two important problems included in many NLP tasks. They, however, have not drawn much attention of Vietnamese researchers all over the world. In this paper, we focus on the integration of advantages from several resourses to improve the accuracy of Vietnamese word segmentation as well as POS tagging task. For word segmentation, we propose a solution in which we try to utilize multiple knowledge resources including dictionary-based model, N-gram model, and named entity recognition model and then integrate them into a Maximum Entropy model. The result of experiments on a public corpus has shown its effectiveness in comparison with the best current models. We got 95.30% F1 measure. For POS tagging, motivated from Chinese research and Vietnamese characteristics, we present a new kind of features based on the idea of word composition. We call it morpheme-based features. Our experiments based on two POS-tagged corpora showed that morpheme-based features always give promising results. In the best case, we got 89.64% precision on a Vietnamese POS-tagged corpus when using Maximum Entropy model.

Content from these authors
© 2010 by The Association for Natural Language Processing
Previous article Next article
feedback
Top