Different Approaches to Bilingual Text Classification Based on Grammatical Inference Techniques

Jorge Civera¹⁹,
Elsa Cubel²⁰,
Alfons Juan¹⁹ &
…
Enrique Vidal²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3523))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

1654 Accesses
1 Citations

Abstract

Bilingual documentation has become a common phenomenon in many official institutions and private companies. In this scenario, the categorization of bilingual text is a useful tool, that can be also applied in the machine translation field. To tackle this classification task, different approaches will be proposed. On the one hand, two finite-state transducer algorithms from the grammatical inference domain will be discussed. On the other hand, the well-known naive Bayes approximation will be presented along with a possible modelization based on n-gram language models. Experiments carried out on a bilingual corpus have demonstrated the adequacy of these methods and the relevance of a second information source in text classification, as supported by classification error rates. Relative reduction of 29% with respect to the best previous results on the monolingual version of the same task has been obtained.

Work supported by the “Agència Valenciana de Ciència i Tecnologia” under grant GRUPOS03/031 and the Spanish project TIC2003-08681-C02-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Identification of Bilingual Suffix Classes for Classification and Translation Generation

Unsupervised Classification of Translated Texts

A Multi-cascaded Deep Model for Bilingual SMS Classification

References

McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1, 69–90 (1999)
Article Google Scholar
Picó, D., Casacuberta, F.: Some statistical-estimation methods for stochastic finitestate transducers. Machine Learning 44, 121–142 (2001)
Article MATH Google Scholar
Knight, K., Al-Onaizan, Y.: Translation with finite-state devices. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 421–437. Springer, Heidelberg (1998)
Chapter Google Scholar
Vidal, E.: Finite-state speech-to-speech translation. In: Int. Conf. on Acoustics Speech and Signal Processing, Munich, Germany, vol. 1, pp. 111–114 (1997)
Google Scholar
Amengual, J.C., Benedí, J.M., Castano, A., Castellanos, A., Jiménez, V.M., Llorens, D., Marzal, A., Pastor, M., Prat, F., Vidal, E., Vilar, J.M.: The EuTrans-I speech translation system. Machine Translation 15, 75–103 (2000)
Article MATH Google Scholar
Oncina, J., García, P., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 448–458 (1993)
Article Google Scholar
Gold, E.M.: Language identification in the limit. Information and Control 10, 447–474 (1967)
Article MATH Google Scholar
Oncina, J., Varó, M.A.: Using domain information during the learning of a subsequential transducer. In: ICGI, Berlin, Germany, pp. 301–312 (1996)
Google Scholar
Cubel, E.: Aprendizaje de transductores subsecuenciales estocásticos. Technical Report II-DSIC-B-23/01, Universidad Politécnica de Valencia, Spain (2002)
Google Scholar
Och, F.J., Ney, H.: Improved statistical alignment models. In: ACL 2000, Hong Kong, China, pp. 440–447 (2000)
Google Scholar
Brown, P.F., Pietra, S.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–312 (1993)
Google Scholar
Viterbi, A.: Error bounds for convolutional codes and a asymtotically optimal decoding algorithm. IEEE Transactions on Information Theory 13, 260–269 (1967)
Article MATH Google Scholar
Witten, I.H., Bell, T.C.: The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Information Theory 37, 1085–1094 (1991)
Article Google Scholar
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modelling. In: Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, San Francisco, USA, pp. 310–318 (1996)
Google Scholar
Juan, A., Vidal, E.: On the use of bernoulli mixture models for text classification. In: Workshop on Pattern Recognition in Information Systems (PRIS 2001), Setúbal, Portugal (2001)
Google Scholar
Llorens, D.: Suavizado de autómatas y traductores finitos estocásticos. PhD thesis, Universitat Politècnica de València (2000), Advisor(s): Dr. J. M. Vilar and Dr. F. Casacuberta
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia,
Jorge Civera & Alfons Juan
Instituto Tecnológico de Informática, Universidad Politécnica de Valencia,
Elsa Cubel & Enrique Vidal

Authors

Jorge Civera
View author publications
You can also search for this author in PubMed Google Scholar
Elsa Cubel
View author publications
You can also search for this author in PubMed Google Scholar
Alfons Juan
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Vidal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Superior Técnico & Instituto de Sistemas e Robótica,, 1049-001, Lisboa, Portugal
Jorge S. Marques
ETSI Informática y e Telecomunicación, University of Granada, 18071, Granada, Spain
Nicolás Pérez de la Blanca
Instituto Superior Técnico, CERENA-Centro de Recursos Naturais e Ambiente, Av. Rovisco Pais, 1049-001, Lisboa, Portugal
Pedro Pina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Civera, J., Cubel, E., Juan, A., Vidal, E. (2005). Different Approaches to Bilingual Text Classification Based on Grammatical Inference Techniques. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds) Pattern Recognition and Image Analysis. IbPRIA 2005. Lecture Notes in Computer Science, vol 3523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11492542_77

Download citation

DOI: https://doi.org/10.1007/11492542_77
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26154-4
Online ISBN: 978-3-540-32238-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Different Approaches to Bilingual Text Classification Based on Grammatical Inference Techniques

Abstract

Access this chapter

Preview

Similar content being viewed by others

Identification of Bilingual Suffix Classes for Classification and Translation Generation

Unsupervised Classification of Translated Texts

A Multi-cascaded Deep Model for Bilingual SMS Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Different Approaches to Bilingual Text Classification Based on Grammatical Inference Techniques

Abstract

Access this chapter

Preview

Similar content being viewed by others

Identification of Bilingual Suffix Classes for Classification and Translation Generation

Unsupervised Classification of Translated Texts

A Multi-cascaded Deep Model for Bilingual SMS Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation