[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Correcting real-word spelling errors by restoring lexical cohesion

Published: 01 March 2005 Publication History

Abstract

Spelling errors that happen to result in a real word in the lexicon cannot be detected by a conventional spelling checker. We present a method for detecting and correcting many such errors by identifying tokens that are semantically unrelated to their context and are spelling variations of words that would be related to the context. Relatedness to context is determined by a measure of semantic distance initially proposed by Jiang and Conrath (1997). We tested the method on an artificial corpus of errors; it achieved recall of 23–50% and precision of 18–25%.

References

[1]
Agirre, E., Gojenola, K., Sarasola, K. and Voutilainen, A. (1998) Towards a single proposal in spelling correction. Proceedings 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING- 98), pp. 22-28. Montreal, Canada.
[2]
Agresti, A. and Finlay, B. (1997) Statistical Methods for the Social Sciences (3rd ed). Prentice-Hall.
[3]
Al-Mubaid, H. and Truemper, K. (forthcoming) Learning to find context-based spelling errors. In: Triantaphyllou, E. and Felici, G., editors, Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Kluwer.
[4]
Atwell, E. and Elliott, S. (1987) Dealing with ill-formed English text. In: Garside, R., Leech, G. and Sampson, G., editors, The Computational Analysis of English: A Corpus-Based Approach, pp. 120-138. Longman.
[5]
Bahl, L. R., Jelinek, F. and Mercer, R. L. (1983) A maximum likelihood approach to continuous speech recognition. IEEE Trans. Patt. Anal. Machine Intell. 5(2): 179-190.
[6]
Barzilay, R. and Elhadad, M. (1999) Using lexical chains for text summarization. In: Mani, I. and Maybury, M. T., editors, Advances in Automatic Text Summarization, pp. 111-121. MIT Press.
[7]
Bernard, J. R. L. (editor) (1986). The Macquarie Thesaurus. Macquarie Library, Sydney.
[8]
Budanitsky, A. (1999) Lexical Semantic Relatedness and its Application in Natural Language Processing, Technical report CSRG-390, Department of Computer Science, University of Toronto. http://www.cs.toronto.edu/compling/Publications/Abstracts/Theses/Budanitskythabs.html
[9]
Budanitsky, A. and Hirst, G. (submitted) Evaluating WordNet-based measures of semantic relatedness.
[10]
Carlson, A. A., Rosen, J. and Roth, D. (2001) Scaling up context-sensitive text correction. Proceedings 13th Innovative Applications of Artificial Intelligence Conference, pp. 45-50. Seattle, WA.
[11]
Fellbaum, C. (editor) (1998) WordNet: An Electronic Lexical Database. MIT Press.
[12]
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. and Ruppin, E. (2002) Placing search in context: The concept revisited. ACM Trans. Infor. Syst. 20(1): 116-131.
[13]
Flexner, S. B. (editor) (1983) Random House Unabridged Dictionary (2nd ed). Random House.
[14]
Fromkin, V. A. (1980) Errors in Linguistic Performance: Slips of the tongue, ear, pen, and hand. Academic Press.
[15]
Golding, A. R. (1995) A Bayesian hybrid method for context-sensitive spelling correction. Proceedings Third Workshop on Very Large Corpora, pp. 39-53. Boston, MA.
[16]
Golding, A. R. and Roth, D. (1996) Applying Winnow to context-sensitive spelling correction. In: Saitta, L., editor, Machine Learning: Proceedings 13th International Conference, pp. 182- 190. Bari, Italy.
[17]
Golding, A. R. and Schabes, Y. (1996) Combining trigram-based and feature-based methods for context-sensitive spelling correction. Proceedings 34th Annual Meeting of the Association for Computational Linguistics, pp. 71-78. Santa Cruz, CA.
[18]
Golding, A. R. and Roth, D. (1999) A Winnow-based approach to context-sensitive spelling correction. Machine Learning, 34(1-3): 107-130.
[19]
Green, S. (1999) Building hypertext links by computing semantic similarity. IEEE Trans. Knowl. & Data Eng. 11(5): 713-731.
[20]
Halliday, M. A. K. and Hasan, R. (1976) Cohesion in English. Longman.
[21]
Hirst, G. and St-Onge, D. (1998) Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum, C. (editor) WordNet: An Electronic Lexical Database, pp. 305-332. MIT Press.
[22]
Hoey, M. (1991) Patterns of Lexis in Text. Oxford University Press.
[23]
Jiang, J. J. and Conrath, D. W. (1997) Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings International Conference on Research in Computational Linguistics, Taiwan.
[24]
Jones, M. P. and Martin, J. H. (1997) Contextual spelling correction using latent semantic analysis. Proceedings Fifth Conference on Applied Natural Language Processing, pp. 166- 173. Washington, DC.
[25]
Kernighan, M. D., Church, K. W. and Gale, W. A. (1990) A spelling correction program based on a noisy channel model. Proceedings 13th International Conference on Computational Linguistics, vol. 2, pp. 205-210. Helsinki, Finland.
[26]
Kukich, K. (1992) Techniques for automatically correcting words in text. ACM Comput. Surv. 24(4): 377-439.
[27]
Landauer, T. K., Foltz, P. W. and Laham, D. (1998) An introduction to latent semantic analysis. Discourse Processes, 25(2-3): 259-284.
[28]
Leacock, C. and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (editor) WordNet: An Electronic Lexical Database, pp. 265-283. MIT Press.
[29]
Lin, D. (1997) Using syntactic dependency as local context to resolve word sense ambiguity. Proceedings 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the ACL, pp. 64-71. Madrid, Spain.
[30]
Lin, D. (1998) An information-theoretic definition of similarity. Proceedings 15th International Conference on Machine Learning, Madison, WI.
[31]
Mangu, L. and Brill, E. (1997) Automatic rule acquisition for spelling correction. Proceedings 14th International Conference on Machine Learning, pp. 734-741. Nashville, TN.
[32]
Mays, E., Damerau, F. J. and Mercer, R. L. (1991) Context based spelling correction. Infor. Process. Manage. 27(5): 517-522.
[33]
Mc Hale, M. L. and Crowter, J. J. (1996) Spelling correction for natural language processing systems. Proceedings Conference on Natural Language Processing and Industrial Applications , Moncton, Canada.
[34]
Mihalcea, R. and Moldovan, D. (2001) Automatic generation of a coarse grained WordNet. Proceedings Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Second Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 35-40. Pittsburgh, PA.
[35]
Miller, G. A., Leacock, C., Tengi, R. and Bunker, R. T. (1993) A semantic concordance. Proceedings ARPA Human Language Technology Workshop, pp. 303-308. San Francisco, CA.
[36]
Mitton, R. (1987) Spelling checkers, spelling correctors, and the misspellings of poor spellers. Infor. Process. Manage. 23(5): 495-505.
[37]
Mitton, R. (1996) English Spelling and the Computer. Longman.
[38]
Morris, J. and Hirst, G. (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1): 21-48.
[39]
Morris, J., Beghtol, C. and Hirst, G. (2003) Term relationships and their contribution to text semantics and information literacy through lexical cohesion. Proceedings 31st Annual Conference of the Canadian Association for Information Science, Halifax, Canada.
[40]
Okumura, M. and Honda, T. (1994) Word sense disambiguation and text segmentation based on lexical cohesion. Proceedings Fifteenth International Conference on Computational Linguistics (COLING-94), pp. 755-761. Kyoto, Japan.
[41]
Pedler, J. (2001a) Computer spellcheckers {sic} and dyslexics --a performance survey. Br. J. Educ. Technol. 32(1): 23-37.
[42]
Pedler, J. (2001b) The detection and correction of real-word spelling errors in dyslexic text. Proceedings 4th Computational Linguistics UK Colloquium, pp. 115-119. Sheffield, UK.
[43]
Pollock, J. J. and Zamora, A. (1983) Collection and characterization of spelling errors in scientific and scholarly text. J. Am. Soc. Infor. Sci. 34(1): 51-58.
[44]
Resnik, P. (1995) Using information content to evaluate semantic similarity. Proceedings 14th International Joint Conference on Artificial Intelligence, pp. 448-453. Montreal, Canada.
[45]
Rubenstein, H. and Goodenough, J. B. (1965) Contextual correlates of synonymy. Comm. ACM, 8(10): 627-633.
[46]
St-Onge, D. (1995) Detecting and correcting malapropisms with lexical chains. Master's thesis, Department of Computer Science, University of Toronto. Published as Technical Report CSRI-319.
[47]
Silber, H. G. and McCoy, K. F. (2002) Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics, 28(4): 487-496.
[48]
Verberne, S. (2002) Context-sensitive spell {sic} checking based on trigram probabilities. Master's thesis, University of Nijmegen.
[49]
Vosse, T. G. (1994) The Word Connection. Doctoral dissertation, University of Leiden.
[50]
Vossen, P. (1998) EuroWordNet. Kluwer.
[51]
Zhao, Y. and Truemper, K. (1999) Effective spell {sic} checking by learning user behavior. Appl. Artif. Intell. 13(8): 725-742.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Natural Language Engineering
Natural Language Engineering  Volume 11, Issue 1
March 2005
129 pages

Publisher

Cambridge University Press

United States

Publication History

Published: 01 March 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Automatic real-word error correction in persian textNeural Computing and Applications10.1007/s00521-024-10045-036:29(18125-18149)Online publication date: 1-Oct-2024
  • (2023)Improving logical flow in English-as-a-foreign-language learner essays by reordering sentencesArtificial Intelligence10.1016/j.artint.2023.103935320:COnline publication date: 5-Jun-2023
  • (2023)“Easy” meta-embedding for detecting and correcting semantic errors in Arabic documentsMultimedia Tools and Applications10.1007/s11042-023-14553-482:14(21161-21175)Online publication date: 22-Feb-2023
  • (2021)TIARA 2.0: an interactive tool for annotating discourse structure and text improvementLanguage Resources and Evaluation10.1007/s10579-021-09566-057:1(5-29)Online publication date: 24-Nov-2021
  • (2020)A survey of semantic relatedness evaluation datasets and proceduresArtificial Intelligence Review10.1007/s10462-019-09796-353:6(4407-4448)Online publication date: 1-Aug-2020
  • (2020)Semantic association computation: a comprehensive surveyArtificial Intelligence Review10.1007/s10462-019-09781-w53:6(3849-3899)Online publication date: 1-Aug-2020
  • (2019)A-I-PoCoToProceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage10.1145/3322905.3322908(19-24)Online publication date: 8-May-2019
  • (2019)Real-Word Errors in Arabic TextsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2019.291840427:8(1308-1320)Online publication date: 1-Aug-2019
  • (2019)Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural NetworksNatural Language Processing and Chinese Computing10.1007/978-3-030-32236-6_31(349-359)Online publication date: 9-Oct-2019
  • (2018)“UTTAM”ACM Transactions on Asian and Low-Resource Language Information Processing10.1145/326462018:1(1-26)Online publication date: 19-Nov-2018
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media