Abstract
What appears to be given in all languages is that words can not be randomly ordered in sentences, but that they must be arranged in certain ways, both globally and locally. The “scrambled” words into a sentence cause a meaningless sentence. Although the use of manually collected grammatical rules can boost the performance of grammar checker in word order diagnosis, the repairing task is still very difficult. This work proposes a method for repairing word order errors in English sentences by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. The novelty of this method concerns the use of a permutations’ filtering approach in order to reduce the search space among the possible sentences with reordered words. The filtering method is based on bigrams’ probabilities. In this work the search space is further reduced using a threshold over bigrams’ probabilities. The experimental results show that more than 95% of the test sentences can be repaired using this technique. The comparative advantage of this method is that it is not restricted into a specific set of words, and avoids the laborious and costly process of collecting word order errors for creating error patterns. Unlike most of the approaches, the proposed method is applicable to any language (language models can be simply computed in any language) and does not work only with a specific set of words. The use of parser and/or tagger is not necessary.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atwell, E.S.: How to detect grammatical errors in a text without parsing it. In: Proceedings of the 3rd EACL, pp. 38–45 (1987)
Bigert, J., Knutsson, O.: Robust error detection: A hybrid approach combining unsupervised error detection and linguistic knowledge. In: Proceedings of Robust Methods in Analysis of Natural language Data (ROMAND 2002), pp. 10–19 (2002)
Chodorow, M., Leacock, C.: An unsupervised method for detecting grammatical errors. In: Proceedings of NAACL 2000, pp. 140–147 (2000)
Feyton, C.M.: Teaching ESL/EFL with the internet. Merill Prentice- Hall (2002)
Folse, K.S.: Intermediate TOEFL Test Practices (rev. ed.). The University of Michigan Press, Ann Arbor (1997)
Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3 and 4), 237–264 (1953)
Golding, A.A.: Bayesian hybrid for context-sensitive spelling correction. In: Proceedings of the 3rd Workshop on Very Large Corpora, pp. 39–53 (1995)
Hawkins, J.A.: A Performance Theory of Order and Constituency. Cambridge University Press, Cambridge (1994)
Heift, T.: Intelligent Language Tutoring Systems for Grammar Practice. Zeitschrift für Interkulturellen Fremdsprachenunterricht (Online) 6(2), 15 (2001)
Katz, S.M.: Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech and Signal Processing 35(3), 400–401 (1987)
Sjöbergh, J.: Chunking: an unsupervised method to find errors in text. In: Proceedings of the 15th Nordic Conference of Computational Linguistics, NODALIDA (2005)
Young, S.J.: Large Vocabulary Continuous Speech Recognition. IEEE Signal Processing Magazine 13(5), 45–57 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Athanaselis, T., Bakamidis, S., Dologlou, I. (2006). A Fast Algorithm for Words Reordering Based on Language Model. In: Kollias, S., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4132. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840930_98
Download citation
DOI: https://doi.org/10.1007/11840930_98
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38871-5
Online ISBN: 978-3-540-38873-9
eBook Packages: Computer ScienceComputer Science (R0)