[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Word sense disambiguation using evolutionary algorithms - Application to Arabic language

Published: 01 December 2014 Publication History

Abstract

Genetic and memetic algorithms are used to solve the word sense disambiguation problem.Applicability of these algorithms to Modern Standard Arabic is demonstrated by quantifying their potential benefits.Genetic algorithm outperforms memetic algorithm and naïve Bayes classifier, attaining 79% precision and 63% recall. Natural language processing is related to human-computer interaction, where several challenges involve natural language understanding. Word sense disambiguation problem consists in the computational assignment of a meaning to a word according to a particular context in which it occurs. Many natural language processing applications, such as machine translation, information retrieval, and information extraction, require this task which occurs at the semantic level. Evolutionary computation approaches can be effective to solve this problem since they have been successfully used for many real-world optimization problems. In this paper, we propose to solve the word sense disambiguation problem using genetic and memetic algorithms, and apply them to Modern Standard Arabic. We demonstrate the performance of several models of our algorithms by carrying out experiments on a large Arabic corpus, and comparing them against a naïve Bayes classifier. Experimental results show that genetic algorithms can achieve more precise prediction than memetic algorithms and naïve Bayes classifier, attaining 79%.

References

[1]
Abney, S., & Light, M. (1999). Hiding a semantic class hierarchy in a markov model. In Proceedings of the ACL workshop on unsupervised learning in natural language processing (pp. 1-8).
[2]
E. Agirre, D. Martinez, Learning class-to-class selectional preferences, in: ConLL '01, Vol. 7, ACL, Stroudsburg, PA, USA, 2001, pp. 3:1-3:8.
[3]
Al-Serhan, H., Al-Shalabi, R., & Kannan, G. (2003). New approach for extracting arabic roots. In Proceedings of the 2003 Arab conference on information technology, ACIT'2003 (pp. 42-59).
[4]
R. Al-Shalabi, G. Kanaan, M. Yaseen, B. Al-Sarayreh, N.A. Al-Naji, Arabic query expansion using interactive word sense disambiguation, in: Proceedings of the second international conference on Arabic language resources and tools, The MEDAR Consortium, Cairo, Egypt, 2009.
[5]
L. Araujo, Evolutionary parsing for a probabilistic context free grammar, in: RSCTC '00, Springer-Verlag, London, UK, 2001, pp. 590-597.
[6]
L. Araujo, Part-of-speech tagging with evolutionary algorithms, in: CICLing '02, Springer-Verlag, London, UK, 2002, pp. 230-239.
[7]
L. Araujo, How evolutionary algorithms are applied to statistical natural language processing, Artificial Intelligence Review, 28 (2007) 275-303.
[8]
Attia, M. A. (2008). Handling arabic morphological and syntactic ambiguities within the LFG framework with a view to machine translation. PhD thesis, UK: University of Manchester.
[9]
S. Banerjee, T. Pedersen, Extended gloss overlaps as a measure of semantic relatedness, in: Proceedings of the 18th international joint conference on artificial intelligence, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 2003, pp. 805-810.
[10]
Beesley, K. R. (2001). Finite-state morphological analysis of arabic at xerox research: Status and plans in 2001. In Proceedings of the workshop on Arabic natural language processing at the 39th annual meeting of the ACL, ACL'01 (pp. 1-8).
[11]
Bin-Muqbil, M. (2006). Phonetic and phonological aspects of arabic emphatics and gutturals. PhD thesis, WI, USA: University of Wisconsin-Madison.
[12]
Black, W., Elkateb, S., Rodriguez, H., Alkhalifa, M., Vossen, P., Pease, A., et al. (2006). Introducing the arabic wordnet project. In P. Sojka & K.-S. Choi (Eds.), Proceedings of the third international WordNet conference (pp. 295-300). Jeju Island, Korea.
[13]
S. Bordag, Word sense induction: Triplet-based clustering and automatic evaluation, in: Proceedings of the 11th conference of the European chapter of the ACL, ACL, 2006, pp. 137-144.
[14]
Brownlee, J. (2011). Clever algorithms: Nature-inspired programming recipes. LuLu.
[15]
T. Buckwalter, Buckwalter arabic morphological analyzer (BAMA) version 2.0. linguistic data consortium (LDC), University of Pennsylvania, Philadelphia, PA, USA, 2004.
[16]
T. Buckwalter, Issues in arabic orthography and morphology analysis, in: Semitic'04, ACL, Stroudsburg, PA, USA, 2004, pp. 31-34.
[17]
M. Ciaramita, M. Johnson, Explaining away ambiguity: Learning verb selectional preference with bayesian networks, in: COLING '00, Vol. 1, ACL, Stroudsburg, PA, USA, 2000, pp. 187-193.
[18]
S. Clark, D. Weir, Class-based probability estimation using a semantic hierarchy, Computational Linguistics, 28 (2002) 187-206.
[19]
Davis, M. W., & Dunning, T. (1996). Query translation using evolutionary programming for multilingual information retrieval II. In Evolutionary programming (pp. 103-112).
[20]
Decadt, B., Hoste, V., Daelemans, W., & den Bosch, A. V. (2004). GAMBL, genetic algorithm optimization of Memory-Based WSD. In R. Mihalcea, & P. Edmonds (Eds.), Proceedings of the third international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3) (pp. 108-112).
[21]
K.A. De Jong, Evolutionary computation - A unified approach, MIT Press, 2006.
[22]
M.T. Diab, An unsupervised approach for bootstrapping arabic sense tagging, in: Semitic'04, ACL, Stroudsburg, PA, USA, 2004, pp. 43-50.
[23]
M. Diab, Second generation tools (amira 2.0): Fast and robust tokenization, pos tagging, and base phrase chunking, in: Proceedings of the second international conference on Arabic language resources and tools, The MEDAR Consortium, Cairo, Egypt, 2009, pp. 285-288.
[24]
Elghamry, K. (2006). Sense and homograph disambiguation in arabic using coordination-based semantic similarity. In Proceedings of AUC-OXFORD conference on language and linguistics.
[25]
Elmougy, S., Hamza, T., & Noaman, H. M. (2008). Naïve bayes classifier for arabic word sense disambiguation. In Proceedings of the 6th international conference on informatics and systems.
[26]
Escudero, G., Màrquez, L., Rigau, G., & Salgado, J. G. (2000b). On the portability and tuning of supervised word sense disambiguation systems. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing an very large corpora (pp. 172-180).
[27]
G. Escudero, L. Màrquez, G. Rigau, Boosting applied to word sense disambiguation, in: ECML '00, Springer-Verlag, London, UK, 2000, pp. 129-141.
[28]
Farag, A., & Nürnberger, A. (2008). Arabic/English word translation disambiguation using parallel corpora and matching schemes. In Proceedings of the 12th annual conference of the European association for machine translation, EAMT 2008 (pp. 6-11).
[29]
A. Farghaly, K. Shaalan, Arabic natural language processing: Challenges and solutions, ACM Transactions on Asian Language Information Processing, 8 (2009) 14:1-14:22.
[30]
C. Fellbaum, WordNet: An electronic lexical database, Bradford Books, 1998.
[31]
Fogel, L. J. (1994). Evolutionary programming in perspective: The top-down view. In J. M. Zurada, R. J. Marks & C. J. Robinson (Eds.), Computational intelligence: Imitating life (pp. 135-146). IEEE Press.
[32]
L.J. Fogel, A.J. Owens, M.J. Walsh, Artificial intelligence through simulated evolution, John Wiley, New York, USA, 1966.
[33]
Gale, W. A., Church, K. W., & Yarowsky, D. (2004). Using bilingual materials to develop word sense disambiguation methods. In Proceedings of the international conference on theoretical and methodological issues in machine translation (pp. 101-112).
[34]
M. Galley, K. McKeown, Improving word sense disambiguation in lexical chaining, in: IJCAI'03, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 2003, pp. 1486-1488.
[35]
A. Gelbukh, G. Sidorov, S.Y. Han, Evolutionary approach to natural language word sense disambiguation through global coherence optimization, WSEAS Transactions on Communication, 1 (2003) 11-19.
[36]
T.F. Gharib, M.B. Habib, Z.T. Fayed, Arabic text classification using support vector machines, International Journal of Computer Applications, 16 (2009) 192-199.
[37]
Gliozzo, A. M., Magnini, B., & Strapparava, C. (2004). Unsupervised domain relevance estimation for word sense disambiguation. In Proceedings of the 2004 conference on empirical methods in natural language processing, EMNLP (pp. 380-387).
[38]
N. Habash, O. Rambow, Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop, in: ACL '05, ACL, Stroudsburg, PA, USA, 2005, pp. 573-580.
[39]
Harabagiu, S. M., Miller, G. A., & Moldovan, D. I. (1999). WordNet 2 - A morphologically and semantically enhanced resource. In Proceedings of the ACL SIGLEX workshop: Standardizing lexical resources (pp. 1-8).
[40]
G. Hirst, D. St Onge, Lexical chains as representation of context for the detection and correction malapropisms, in: WordNet: An electronic lexical database, MIT Press, Cambridge, MA, 1998, pp. 305-332.
[41]
J.H. Holland, Adaptation in natural and artificial systems, University of Michigan Press, Ann Arbor, Cambridge, MA, USA, 1975.
[42]
Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy.
[43]
Keller, B., & Lutz, R. (1997). Evolving stochastic context-free grammars from examples using a minimum description length principle. In Workshop on automata induction, grammatical inference and language acquisition, ICML097.
[44]
Khoja, S. (2001). Stemmer. <http://zeus.cs.pacificu.edu/shereen/research.htm>.
[45]
J.R. Koza, Genetic programming, MIT Press, Cambridge, MA, USA, 1992.
[46]
M. Lapata, F. Keller, An information retrieval approach to sense ranking, in: Human language technologies 2007: Proceedings of the conference of the North American chapter of the ACL, ACL, Rochester, New York, 2007, pp. 348-355.
[47]
C. Leacock, G.A. Miller, M. Chodorow, Using corpus statistics and WordNet relations for sense identification, Computational Linguistics, 24 (1998) 147-165.
[48]
Y.K. Lee, H.T. Ng, An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation, in: EMNLP '02, Vol. 10, ACL, Stroudsburg, PA, USA, 2002, pp. 41-48.
[49]
M. Lesk, Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone, in: SIGDOC '86, ACM, New York, NY, USA, 1986, pp. 24-26.
[50]
H. Li, N. Abe, Generalizing case frames using a thesaurus and the MDL principle, Computational Linguistics, 24 (1998) 217-244.
[51]
D. Lin, An information-theoretic definition of similarity, in: ICML '98, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 1998, pp. 296-304.
[52]
Mallery, J. C. (1988). Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. PhD thesis, Cambridge, MA, USA; MIT Political Science Department.
[53]
D. McCarthy, R. Koeling, J. Weeds, J. Carroll, Finding predominant word senses in untagged text, in: ACL '04, ACL, Stroudsburg, PA, USA, 2004, pp. 280-287.
[54]
Z. Michalewicz, Genetic algorithms+data structures=evolution programs, Springer Verlag New York, Inc, New York, NY, USA, 1994.
[55]
R. Mihalcea, Co-training and self-training for word sense disambiguation, in: HLT-NAACL 2004 workshop: Eighth conference on computational natural language learning, ACL, 2004, pp. 33-40.
[56]
R. Mihalcea, P. Tarau, E. Figa, Pagerank on semantic networks, with application to word sense disambiguation, in: COLING '04, ACL, Stroudsburg, PA, USA, 2004, pp. 1126-1132.
[57]
S. Mohammad, G. Hirst, Determining word sense dominance using a thesaurus, in: Proceedings of the 11th conference on European chapter of the ACL, EACL, 2004, pp. 121-128.
[58]
Mooney, R. J. (1996). Comparative experiments on disambiguating word senses: An illustration of the role of bias in machine learning. In Proceedings of the 1996 conference on empirical methods in natural language processing (pp. 82-91).
[59]
Moscato, P. (1989). On evolution, search, optimization, GAs and martial arts: Toward memetic algorithms. Technical report, Pasadena, CA: California Institute of Technology.
[60]
R. Navigli, Word sense disambiguation: A survey, ACM Computer Survey, 41 (2009) 10:1-10:69.
[61]
R. Navigli, P. Velardi, Structural semantic interconnections: A knowledge-based approach to word sense disambiguation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (2005) 1075-1086.
[62]
D.D. Palmer, Text pre-processing, in: Handbook of natural language processing, CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010.
[63]
P. Pantel, D. Lin, Discovering word senses from text, in: KDD '02, ACM, New York, NY, USA, 2002, pp. 613-619.
[64]
Pedersen, T. (1998). Learning probabilistic models of word sense disambiguation. PhD thesis, Dallas, TX, USA: Southern Methodist University.
[65]
Pedersen, T., & Bruce, R. (1997). Distinguishing word senses in untagged text. In Proceedings of the conference on empirical methods in natural language processing (EMNLP-97), Providence, RI (pp. 197-207).
[66]
R. Rada, H. Mili, E. Bicknell, M. Blettner, Development and application of a metric on semantic nets, IEEE Transactions on System, Man, and Cybernetics, 19 (1989) 17-30.
[67]
Rechenberg, I. (1973). Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Fromman-Holzboog.
[68]
I. Rechenberg, Evolutionsstrategie'94, Friedrich Frommann Verlag, Stuttgart, 1994.
[69]
Resnik, P. (1993). Selection and information: A class-based approach to lexical relationships. PhD thesis, Philadelphia, PA, USA: University of Pennsylvania.
[70]
P. Resnik, Using information content to evaluate semantic similarity in a taxonomy, in: Proceedings of the 14th international joint conference on artificial intelligence, Vol. 1, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 1995, pp. 448-453.
[71]
Resnik, P. (1997). Selectional preference and sense disambiguation. In Proceedings of the ACL SIGLEX workshop on tagging text with lexical semantics: Why, what, and how? (pp. 52-57).
[72]
R. Roth, O. Rambow, N. Habash, M. Diab, C. Rudin, Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking, in: HLT-Short'08, ACL, Stroudsburg, PA, USA, 2008, pp. 117-120.
[73]
Sawalha, M., & Atwell, E. (2008). Comparative evaluation of arabic language morphological analysers and stemmers. In Proceedings of 22nd international conference on computational linguistics, COLING (Posters), Manchester, UK (pp. 107-110).
[74]
H. Schütze, Automatic word sense discrimination, Computational Linguistics, 24 (1998) 97-123.
[75]
Schwefel, H. -P. (1965). Cybernetic evolution as strategy for experimental research in fluid mechanics. PhD thesis, Hermann Föttinger-Institute for Fluid Mechanics, Technical University of Berlin (in German).
[76]
H.-P. Schwefel, Evolution and optimum seeking: The sixth generation, John Wiley & Sons, Inc, New York, NY, USA, 1993.
[77]
M. Sussna, Word sense disambiguation for free-text indexing using a massive semantic network, in: CIKM '93, ACM, New York, NY, 1993.
[78]
M.M. Syiam, Z.T. Fayed, M. Habib, An intelligent system for arabic text categorization, International Journal of Intelligent Computing and Information Sciences, 6 (2006) 1-19.
[79]
G. Tsatsaronis, M. Vazirgiannis, I. Androutsopoulos, Word sense disambiguation with spreading activation networks generated from thesauri, in: IJCAI'07, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 2007, pp. 1725-1730.
[80]
van Dongen, S. M. (2000). Graph clustering by flow simulation. PhD thesis, University of Utrecht, The Netherlands.
[81]
Z. Wu, M. Palmer, Verb semantics and lexical selection, in: 32nd Annual meeting of the association for computational linguistics, New Mexico State University, Las Cruces, New Mexico, 1994, pp. 133-138.
[82]
D. Yarowsky, Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French, in: ACL '94, ACL, Stroudsburg, PA, USA, 1994, pp. 88-95.
[83]
D. Yarowsky, Unsupervised word sense disambiguation rivaling supervised methods, in: ACL '95, ACL, Stroudsburg, PA, USA, 1995, pp. 189-196.
[84]
C. Zhang, Y. Zhou, T. Martin, Genetic word sense disambiguation algorithm, in: IITA '08, vol. 01, IEEE Computer Society, Washington, DC, USA, 2008, pp. 123-127.

Cited By

View all
  • (2023)Stacking of BERT and CNN Models for Arabic Word Sense DisambiguationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/362337922:11(1-14)Online publication date: 7-Sep-2023
  • (2023)Assessing American presidential candidates using principles of ontological engineering, word sense disambiguation, data envelope analysis and qualitative comparative analysisInternational Journal of Speech Technology10.1007/s10772-023-10043-y26:3(743-764)Online publication date: 1-Sep-2023
  • (2022)Arabic Word Sense Disambiguation for Information RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/351045121:4(1-19)Online publication date: 19-Jan-2022
  • Show More Cited By
  1. Word sense disambiguation using evolutionary algorithms - Application to Arabic language

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Computers in Human Behavior
      Computers in Human Behavior  Volume 41, Issue C
      December 2014
      554 pages

      Publisher

      Elsevier Science Publishers B. V.

      Netherlands

      Publication History

      Published: 01 December 2014

      Author Tags

      1. Evolutionary algorithms
      2. Genetic algorithms
      3. Memetic algorithms
      4. Modern Standard Arabic
      5. Natural language processing
      6. Word sense disambiguation

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 21 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Stacking of BERT and CNN Models for Arabic Word Sense DisambiguationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/362337922:11(1-14)Online publication date: 7-Sep-2023
      • (2023)Assessing American presidential candidates using principles of ontological engineering, word sense disambiguation, data envelope analysis and qualitative comparative analysisInternational Journal of Speech Technology10.1007/s10772-023-10043-y26:3(743-764)Online publication date: 1-Sep-2023
      • (2022)Arabic Word Sense Disambiguation for Information RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/351045121:4(1-19)Online publication date: 19-Jan-2022
      • (2022)Towards a historical dictionary for Arabic languageInternational Journal of Speech Technology10.1007/s10772-020-09704-z25:1(29-41)Online publication date: 1-Mar-2022
      • (2020)Disambiguating Arabic Words According to Their Historical Appearance in the Document Based on Recurrent Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/341056919:6(1-16)Online publication date: 15-Oct-2020
      • (2019)Arabic word sense disambiguation: a reviewArtificial Intelligence Review10.1007/s10462-018-9622-652:4(2475-2532)Online publication date: 1-Dec-2019
      • (2018)Context-based Arabic Word Sense Disambiguation using Short Text Similarity MeasureProceedings of the 12th International Conference on Intelligent Systems: Theories and Applications10.1145/3289402.3289544(1-6)Online publication date: 24-Oct-2018
      • (2017)A hybrid genetic-ant colony optimization algorithm for the word sense disambiguation problemInformation Sciences: an International Journal10.5555/3138887.3139089417:C(20-38)Online publication date: 1-Nov-2017
      • (2017)Ideology algorithmNeural Computing and Applications10.1007/s00521-016-2379-428:1(845-876)Online publication date: 1-Jan-2017
      • (2017)Memetic Algorithm Based on Global-Best Harmony Search and Hill Climbing for Part of Speech TaggingMining Intelligence and Knowledge Exploration10.1007/978-3-319-71928-3_20(198-211)Online publication date: 13-Dec-2017
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media