[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Different Approaches to Bilingual Text Classification Based on Grammatical Inference Techniques

  • Conference paper
Pattern Recognition and Image Analysis (IbPRIA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3523))

Included in the following conference series:

Abstract

Bilingual documentation has become a common phenomenon in many official institutions and private companies. In this scenario, the categorization of bilingual text is a useful tool, that can be also applied in the machine translation field. To tackle this classification task, different approaches will be proposed. On the one hand, two finite-state transducer algorithms from the grammatical inference domain will be discussed. On the other hand, the well-known naive Bayes approximation will be presented along with a possible modelization based on n-gram language models. Experiments carried out on a bilingual corpus have demonstrated the adequacy of these methods and the relevance of a second information source in text classification, as supported by classification error rates. Relative reduction of 29% with respect to the best previous results on the monolingual version of the same task has been obtained.

Work supported by the “Agència Valenciana de Ciència i Tecnologia” under grant GRUPOS03/031 and the Spanish project TIC2003-08681-C02-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)

    Google Scholar 

  2. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  3. Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1, 69–90 (1999)

    Article  Google Scholar 

  4. Picó, D., Casacuberta, F.: Some statistical-estimation methods for stochastic finitestate transducers. Machine Learning 44, 121–142 (2001)

    Article  MATH  Google Scholar 

  5. Knight, K., Al-Onaizan, Y.: Translation with finite-state devices. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 421–437. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  6. Vidal, E.: Finite-state speech-to-speech translation. In: Int. Conf. on Acoustics Speech and Signal Processing, Munich, Germany, vol. 1, pp. 111–114 (1997)

    Google Scholar 

  7. Amengual, J.C., Benedí, J.M., Castano, A., Castellanos, A., Jiménez, V.M., Llorens, D., Marzal, A., Pastor, M., Prat, F., Vidal, E., Vilar, J.M.: The EuTrans-I speech translation system. Machine Translation 15, 75–103 (2000)

    Article  MATH  Google Scholar 

  8. Oncina, J., García, P., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 448–458 (1993)

    Article  Google Scholar 

  9. Gold, E.M.: Language identification in the limit. Information and Control 10, 447–474 (1967)

    Article  MATH  Google Scholar 

  10. Oncina, J., Varó, M.A.: Using domain information during the learning of a subsequential transducer. In: ICGI, Berlin, Germany, pp. 301–312 (1996)

    Google Scholar 

  11. Cubel, E.: Aprendizaje de transductores subsecuenciales estocásticos. Technical Report II-DSIC-B-23/01, Universidad Politécnica de Valencia, Spain (2002)

    Google Scholar 

  12. Och, F.J., Ney, H.: Improved statistical alignment models. In: ACL 2000, Hong Kong, China, pp. 440–447 (2000)

    Google Scholar 

  13. Brown, P.F., Pietra, S.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–312 (1993)

    Google Scholar 

  14. Viterbi, A.: Error bounds for convolutional codes and a asymtotically optimal decoding algorithm. IEEE Transactions on Information Theory 13, 260–269 (1967)

    Article  MATH  Google Scholar 

  15. Witten, I.H., Bell, T.C.: The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Information Theory 37, 1085–1094 (1991)

    Article  Google Scholar 

  16. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modelling. In: Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, San Francisco, USA, pp. 310–318 (1996)

    Google Scholar 

  17. Juan, A., Vidal, E.: On the use of bernoulli mixture models for text classification. In: Workshop on Pattern Recognition in Information Systems (PRIS 2001), Setúbal, Portugal (2001)

    Google Scholar 

  18. Llorens, D.: Suavizado de autómatas y traductores finitos estocásticos. PhD thesis, Universitat Politècnica de València (2000), Advisor(s): Dr. J. M. Vilar and Dr. F. Casacuberta

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Civera, J., Cubel, E., Juan, A., Vidal, E. (2005). Different Approaches to Bilingual Text Classification Based on Grammatical Inference Techniques. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds) Pattern Recognition and Image Analysis. IbPRIA 2005. Lecture Notes in Computer Science, vol 3523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11492542_77

Download citation

  • DOI: https://doi.org/10.1007/11492542_77

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26154-4

  • Online ISBN: 978-3-540-32238-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics