Abstract
Abbreviations are very common and are widely used in both written and spoken language. However, they are not always explicitly defined and in many cases they are ambiguous. In this research, we present a process that attempts to solve the problem of abbreviation ambiguity. Various features have been explored, including context-related methods and statistical methods. The application domain is Jewish Law documents written in Hebrew, which are known to be rich in ambiguous abbreviations. Various variants of the one sense per discourse hypothesis (by varying the scope of discourse) have been implemented. Several common machine learning methods have been tested to find a successful integration of these variants. The best results have been achieved by SVM, with 96.09% accuracy.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abdi, H., Valentin, D., Edelman, B.: Neural networks. Sage, Thousand, Oaks (1999)
Adar, E.: S-RAD: A Simple and Robust Abbreviation Dictionary. Technical Report, HP Laboratories (2002)
Ashkenazi, S., Jarden, D.: Ozar Rashe Tevot: Thesaurus of Hebrew Abbreviations (in Hebrew). Kiryat Sefere LTD., Jerusalem (1994)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 273–297 (1995)
Chang, C., Lin, C.: LIBSVM: a Library for Support Vector Machines. Software in Python (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Frantzi, K., Ananiadou, S.: The C value domain independent method for multiword term extraction. JNLP 6(3), 145–179 (1999)
Gale, W., Church, K., Yarowsky, D.: One Sense per Discourse. In: Proceedings of the 4th DARPA speech in Natural Language Workshop, pp. 233–237 (1992)
Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving Abbreviations to their Senses in Medline. Bioinformatics 21(18), 3658–3664 (2005)
Good, I. J.: The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press, Cambridge (1965)
Hacohen, Y. M.: Mishnah Berurah (in Hebrew). Hotzaat Leshem, Jerusalem (1995)
Hacohen, Y. M.: Mishnah Berurah. English Translation, Pisgah Foundation. Feldheim Publishers, Jerusalem (1990)
HaCohen-Kerner, Y., Kass, A., Peretz, A.: Baseline Methods for Automatic Disambiguation of Abbreviations in Jewish Law Documents. In: Vicedo, J. L., Martinez-Barco, P., Munoz, R., Noeda, M. S. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 58–69. Springer, Heidelberg (2004)
Ide, N., Véronis, J.: Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1–40 (1998)
Joint Commission on Accreditation of Healthcare Organizations: Medication errors related to potentially dangerous abbreviation. Sentinel Event Alert 23 (2001)
Liu, H., Aronson, A. R., Friedman, C.: A Study of Abbreviations in MEDLINE Abstracts. In: Proc AMIA Symp., pp. 464–469 (2002)
Miller, G. A.: The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity of Information. Psychological Science 63, 81–97 (1956)
Okazaki, N., Ananiadou, S.: Building an Abbreviation Dictionary using a Term Recognition Approach. Bioinformatics 22(24), 3089–3095 (2006)
Okazaki, N., Ananiadou, S.: Clustering Acronyms in Biomedical Text for Disambiguation. In: Proceedings of fifth international conference on Language Resources and Evaluation (LREC), pp. 959–962 (2006)
Ovadia, Y.: Yechave Daat (in Hebrew). Chazon Ovadia, Jerusalem (1977)
Ovadia, Y.: Yabia Omer (in Hebrew). Chazon Ovadia, Jerusalem (1986)
Pakhomov, S.: Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts. Association for Computational Linguistics (ACL), pp. 160-167 (2002)
Pakhomov, S., Pedersen, T., Chute, C. G.: Abbreviation and Acronym Disambiguation in Clinical Discourse. In: American Medical Informatics Association Annual Symposium, pp. 589–593 (2005)
Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet: Similarity - Measuring the Relatedness of Concepts. In: Proceedings of the 9th National Conference on Artificial Intelligence, pp. 1024–1025 (2004)
Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M., Morrell, M., Rumshisky, A.: Extraction and Disambiguation of Acronym-Meaning Pairs in Medline (unpublished manuscript) (2001)
Quinlan, J. R.: C4.5: Programs For Machine Learning. Morgan Kaufmann, Los Altos (1993)
Salton, G.: The SMART Information Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs (1971)
Witten, H., Frank, E.: Weka 3.4.12: Machine Learning Software in Java(2007), http://www.cs.waikato.ac.nz/~ml/weka
Yarowsky, D.: One Sense per Collocation. In: Proceedings of the Workshop on Human Language Technology, pp. 266–271 (1993)
Yu, H., Hripcsak, G., Friedman, C.: Mapping Abbreviations to Full Forms in Biomedical Articles. J. Am. Med. Inform. Assoc. 9(3), 262–272 (2002)
Yu, Z., Tsuruoka, Y., Tsujii, J.: Automatic Resolution of Ambiguous Abbreviations in Biomedical Texts using SVM and One Sense per Discourse Hypothesis. In: SIGIR 2003 Workshop on Text Analysis and Search for Bioinformatics (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
HaCohen-Kerner, Y., Kass, A., Peretz, A. (2008). Abbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds) Natural Language and Information Systems. NLDB 2008. Lecture Notes in Computer Science, vol 5039. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69858-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-69858-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69857-9
Online ISBN: 978-3-540-69858-6
eBook Packages: Computer ScienceComputer Science (R0)