[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary

Published: 01 May 2006 Publication History

Abstract

Word sense disambiguation (WSD) is meant to assign the most appropriate sense to a polysemous word according to its context. We present a method for automatic WSD using only two resources: a raw text corpus and a machine-readable dictionary (MRD). The system learns the similarity matrix between word pairs from the unlabeled corpus, and it uses the vector representations of sense definitions from MRD, which are derived based on the similarity matrix. In order to disambiguate all occurrences of polysemous words in a sentence, the system separately constructs the acyclic weighted digraph (AWD) for every occurrence of polysemous words in a sentence. The AWD is structured based on consideration of the senses of context words which occur with a target word in a sentence. After building the AWD per each polysemous word, we can search the optimal path of the AWD using the Viterbi algorithm. We assign the most appropriate sense to the target word in sentences with the sense on the optimal path in the AWD. By experiments, our system shows 76.4% accuracy for the semantically ambiguous Korean words.

References

[1]
Natural language understanding. Second ed. The Benjamin/Cummings Publishing Company.
[2]
Banerjee, S., & Pedersen, T. (2002). An adapted lesk algorithm for word sense disambiguation using WordNet. In Proceedings of the third international conference on intelligent text processing and computational linguistics, Mexico City, February 2002.
[3]
Brown, P., Della Pietra, S., Della Pietra V., & Mercer, R., (1991). Word-sense disambiguation using statistical methods. In Proceedings of the 29th Annual Meeting of the ACL.
[4]
Cho, J. M. (1998). Verb sense disambiguation using corpus and dictionary, Ph.D. thesis, Department of Computer Science in Korea Advanced Institute of Science and Technology.
[5]
Fundamentals of data structures in C++. W. H. Freeman and Company.
[6]
A method for disambiguation word sense in a large corpus. Computers and humanities (pp. 415-439), 1993.Kluwer Academic Publishers.
[7]
Lee, H. (1999a). Construction of korean lexical knowledge base using Korean machine readable dictionary. MS thesis, Department of Computer Science in Sogang University.
[8]
Lee, H. (1999b). A classification information model for word sense disambiguation. Ph.D. thesis, Department of Computer Science in Korea University.
[9]
Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. Proceedings of the ACM SIGDOC conference (pp. 24-26).
[10]
Park, Y. (1997). Automatic sense clustering using an MRD: A genetic algorithm approach. Ph.D. thesis, Department of Computer Science in Yonsei University.
[11]
Pedersen, T., & Bruce, R. (1997). Distinguishing word sense in untagged text. In Proceedings of the 31st annual meeting of ACL (pp. 183-190).
[12]
Providing machine tractable dictionary tools. Machine Translation. v5. 99-154.
[13]
Yarowsky, D. (1992). Word-sense disambiguation using statistical models of Roget's categories trained on large corpora. In Proceedings of COLING-92 (pp. 454-460).
[14]
Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised method. In Proceedings of the annual meeting of the ACL.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal
Information Processing and Management: an International Journal  Volume 42, Issue 3
May 2006
282 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 May 2006

Author Tags

  1. AWD
  2. Acyclic weighted digraph
  3. HMM
  4. MRD
  5. Machine-readable dictionary
  6. POS
  7. Viterbi algorithm
  8. WSD
  9. Word sense disambiguation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media