Abstract
This paper proposes a hybrid possibilistic approach for bilingual terminology extraction using possibility and necessity measures. On the one hand, we extract domain-relevant terms from the source language, and on the other hand, we build a co-occurrence-based translation graph, which is mined to translate terms in the target language. We compare our approach with different state-of-the art approaches. Experimental results show that the possibilistic approach reaches better results in terms of Recall, Precision and Mean Average Precision (MAP). The differences between the compared approaches show that our contribution is significant in terms of p-value.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In parallel corpora, documents are translated sentence-by- sentence.
- 2.
In comparable corpora, documents are dealing with same topics and subjects.
- 3.
- 4.
- 5.
- 6.
References
Shah, N.S.: Review of indexing techniques applied in information retrieval. Pak. J. Eng. Technol. Sci. 5(1) (2016)
Hazem, A., Morin, E.: Extraction de lexiques bilingues à partir de corpus comparables par combinaison de représentations contextuelles. In: Actes de la 20ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), Sables d’Olonne, France, 17–21 June, pp. 243–256 (2013)
Sellami, R., Sadat, F., Belguith, L.H.: Extraction de lexiques bilingues à partir de Wikipédia. In: Atelier de Traitement Automatique des Langues Africaines, JEP (conférence Journées d’Études en Parole) -TALN-RECITAL, Grenoble, France, 4–8 June (2012)
Hazem, A., Morin, E.: Efficient data selection for bilingual terminology extraction from comparable corpora. In: Proceedings of the 26th International Conference on Computational Linguistics (COLING), Osaka, Japan, 11–16 Dec 2016. Technical Papers, pp. 3401–3411 (2016)
Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1(1), 3–28 (1978)
Bouamor, D., Popescu, A., Semmar, N., Zweigenbaum, P.: Building specialized bilingual lexicons using large scale background knowledge. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18–21 Oct, pp. 479–489 (2013)
Zhao, B., Xing, E.P.: HM-BiTAM: Bilingual topic exploration, word alignment, and translation. In: Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, 3–6 Dec, pp. 1689–1696 (2007)
Lefever, E., Macken, L., Hoste, V.: Language-independent bilingual terminology extraction from a multilingual parallel corpus. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, 03 Apr, pp. 496–504 (2009)
Okita, T., Hosseinzadeh Vahid, A., Way, A., Liu, Q.: The DCU terminology translation system for the medical query subtask at WMT 2014. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, USA, 26–27 June, pp. 239–245 (2014)
Vulic, I., Moens, M.F.: Bilingual distributed word representations from document-aligned comparable data. J. Artif. Intell. Res. 55(1), 953–994 (2016)
Chebel, M., Latiri, C., Gaussier, E.: Bilingual lexicon extraction from comparable corpora based on closed concepts mining. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10234, pp. 586–598. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57454-7_46
Dubois, D., Prade, H.: Possibility theory and its application: where do we stand. Mathw. Soft Comput. 18(1), 18–31 (2011)
Menacer, M.A., Boumerdas, A., Zakaria, C., Smaili, K.: A new language model based on possibility theory. In: Gelbukh, A. (ed.) CICLing 2016. LNCS, vol. 9623, pp. 127–139. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75477-2_8
Bounhas, I., Ayed, R., Elayeb, B., Evrard, F., Saoud, N.B.B.: Experimenting a discriminative possibilistic classifier with reweighting model for Arabic morphological disambiguation. Comput. Speech Lang. 33(1), 67–87 (2015)
Bounhas, I., Ayed, R., Elayeb, B., Saoud, N.B.B.: A hybrid possibilistic approach for Arabic full morphological disambiguation. Data Knowl. Eng. 100, 240–254 (2015)
Lahbib, W., Bounhas, I., Slimani, Y.: Arabic terminology extraction and enrichment based on domain-specific text mining. In: The 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Vietri sul Mare, Italy, 9–11 Nov, pp. 340–347 (2015)
Alguliyev, R.M., Aliguliyev, R.M., Isazade, N.R.: A new similarity measure and mathematical model for text summarization. Problems Inf. Technol. 6(1), 42–53 (2015)
Lahbib, W., Bounhas, I., Elayeb, B.: Arabic-English domain terminology extraction from aligned corpora. In: Meersman, R., et al. (eds.) OTM 2014. LNCS, vol. 8841, pp. 745–759. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45563-0_46
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics on Human Language Technologies (HLT-NAACL), Atlanta, Georgia, 10–12 June, pp. 746–751 (2013)
Demˇsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Lahbib, W., Bounhas, I., Slimani, Y. (2018). A Possibilistic Approach for Arabic Domain Terminology Extraction and Translation. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds) Computer and Information Sciences. ISCIS 2018. Communications in Computer and Information Science, vol 935. Springer, Cham. https://doi.org/10.1007/978-3-030-00840-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-00840-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00839-0
Online ISBN: 978-3-030-00840-6
eBook Packages: Computer ScienceComputer Science (R0)