Abstract
Is adaptation of English NLP applications the right way to gomultilingual? Should one prefer ``language-independent'' systems with aview to applying them to a large number of different languages? Experience from the processing of Portuguese in several differentareas (part-of-speech tagging, corpus tools, lexical decomposition,machine translation, etc.) suggests that neither of these offers a satisfactory solution.
This paper argues for a thorough study of the way individual languageswork in order to develop applications suited for the language inquestion, i.e., ``language-dependent'' systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Almeida, J. J. and Ulisses Pinto: 1995, Manual de utilizador do JSpell [JSpell User's Manual], Departamento de Informática, Universidade do Minho, Braga, Portugal.
Aston, Guy and Lou Burnard: 1996, The BNC Handbook: Exploring the British National Corpus with SARA, Edinburgh University Press, Edinburgh.
Bick, Eckhard: 1998, 'Structural Lexical Heuristics in the Automatic Analysis of Portuguese', in Proceedings of the 11th Nordic Conference on Computational Linguistics, Nodalida '98, Copenhagen, pp. 44-56.
Bindi, Remo, Nicoletta Calzolari, Monica Monachini, Vito Pirrelli and Antonio Zampolli: 1994, 'Corpora and Computational Lexica: Integration of Different Methodologies of Lexical Knowledge Acquisition', Literary and Linguistic Computing 9, 29-46.
Catford, J. C.: 1967, A Linguistic Theory of Translation: An Essay in Applied Linguistics, Oxford University Press, Oxford.
Christ, Oliver: 1994, 'A Modular and Flexible Architecture for an Integrated Corpus Query System', in Proceedings of COMPLEX'94: 3rd Conference on Computational Lexicography and Text Research, Budapest, pp. 23-32.
Christ, Oliver: 1998, 'Linking WordNet to a Corpus Query System', in Nerbonne (1998), pp. 189-202.
Church, Kenneth Ward: 1988, 'A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text', in 2nd Conference on Applied Natural Language Processing, Austin, TX, pp. 136-143.
Church, Kenneth W. and William A. Gale: 1991, 'Concordances for Parallel Text', in Using Corpora: Proceedings of the Eight Annual Conference of the UW Centre for the New OED and Text Research, Oxford, pp. 40-62.
Dagan, Ido and Alon Itai: 1994, 'Word Sense Disambiguation Using a Second Language Monolingual Corpus', Computational Linguistics 20, 563-596.
des Tombe, Louis and Susan Armstrong-Warwick: 1993, 'Using Function Words to Measure Translation Quality', in Making Sense of Words: Proceedings of the Ninth Annual Conference of the UW Centre for the New OED and Text Research, Oxford, pp. 1-18.
Doherty, Monika: 1992, 'Informationelle Holzwege: Ein Problem der Übersetzungswissenschaft' [Informational garden paths: a problem of translation science], Zeitschrift für Literaturwissenschaft und Linguistik 84, 30-49.
Doherty, Monika: 1997, 'Übersetzen im Spannungsfeld zwischen Grammatik und Pragmatik', [Translation in the middle ground between grammar and pragmatics], in Rudi Keller (ed.), Linguistik und Literaturübersetzen, Narr, Tübingen, pp. 79-102.
Dorr, Bonnie Jean: 1993. Machine Translation: A View from the Lexicon, The MIT Press, Cambridge, Massachusetts.
Dorr, Bonnie J.: 1997, 'Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation', Machine Translation 12, 271-322.
Eagles: 1996a. Recommendations for the Morphosyntactic Annotation of Corpora, EAG-TCWG-MAC/R, Version of March, 1996, retrieved 11 May 1999 from the World Wide Web, http://www.ilc.pi.cnr.it/EAGLES96/annotate/annotate.html.
Eagles: 1996b. Synopsis and Comparison of Morphosyntactic Phenomena Encoded in Lexicons and Corpora: A Common Proposal and Applications to European Languages, EAG-CLWG-MORPHSYN/R, 31 August 1996, retrieved 8 May 1999 from the World Wide Web, http://www.ilc.pi.cnr.it/EAGLES96/morphsyn/morphsyn.html.
Engh, Jan: 1993, 'Linguistic Normalisation in Language Industry: Some Normative and Descriptive Aspects of Dictionary Development', Hermes: Journal of Linguistics 10, 53-64.
Fabricius-Hansen, Cathrine: 1991, 'Contrastive Stylistics: Outline of a Research Project on German and Norwegian Non-fictional Prose', in Contrastive Linguistics: Papers from the CL Symposium at the Aarhus School of Business, Århus, Denmark, pp. 51-76.
Fabricius-Hansen, Cathrine: 1998, 'Information Density and Translation, with Special Reference to German-Norwegian-English', in Johansson and Oksefjell (1998), pp. 197-234.
Frankenberg-Garcia, Ana: 1998, 'Using Translation Traps to Sort Out Portuguese-English Crosslinguistic Influence', paper delivered at the 7th Brazilian Translators' Forum and 1st Brazilian International Translators' Forum, University of São Paulo, Brazil.
Garside, Roger, Geoffrey Leech and Anthony McEnery (eds): 1997, Corpus Annotation: Linguistic Information from Computer Text Corpora, Longman, London.
Gawrońska, Barbara: 1993, An MT Oriented Model of Aspect and Article Semantics, Lund University Press, Lund.
Granger, Sylviane: 1998, 'The Computer Learner Corpus: A Testbed for Electronic EFL Tools', in Nerbonne (1998), pp. 175-188.
Hovy, Eduard, Nancy Ide, Robert Frederking, Joseph Mariani and Antonio Zampolli: 1998, 'Multilingual Information Management: Current Levels and Future Abilities', July 1998, retrieved 5 March 1999 from the World Wide Web, http://www.cs.cmu.edu/~ref/mlim/.
Isabelle, Pierre, Marc Dymetman, George Foster, Jean-Marc Jutras, Elliot Macklovitch, François Perrault, Xiaobo Ren and Michel Simard: 1993, 'Translation Analysis and Translation Automation', in TMI-93: the Fifth International Conference on Theoretical and Methodological Issues in Machine Translation with Special Emphasis on: MT in the Next Generation, Kyoto, Japan, pp. 201-217.
Johansson, Stig and Signe Oksefjell (eds): (1998), Corpora and Crosslinguistic Research: Theory, Method, and Case Studies, Rodopi, Amsterdam.
Källgren, Gunnel: 1985, 'Swedish Language Processing', in Proceedings of ELS Conference on Natural-Language Applications, Lyngby, Denmark, pp. 1-6.
Kay, Martin, Jean Mark Gawron and Peter Norvig: 1994, Verbmobil: A Translation System for Face-to-Face Dialog, Center for the Study of Language and Information, Stanford, California.
Kilgarriff, Adam: 1997, 'I Don't Believe in Word Senses', Computers and the Humanities 31, 91-113.
Koskenniemi, Kimmo: 1983, Two-level Morphology: A General Computational Model for Word-Form Recognition and Production, Publication No. 11, Department of General Linguistics, University of Helsinki.
Landsbergen, Jan: 1987, 'Isomorphic Grammars and their Use in the Rosetta Translation System', in Margaret King (ed.), Machine Translation Today: The State of the Art, Edinburgh University Press, Edinburgh, pp. 351-372.
Leech, Geoffrey: 1997, 'Grammatical Tagging', in Anthony McEnery (eds): Corpus Annotation: Linguistic Information from Computer Text Corpora, Longman, London Garside et al. (1997), pp. 19-33.
León, Fernando Sánchez and Amalio F. Nieto Serrano: 1997, 'Retargeting a Tagger', in Anthony McEnery (eds): Corpus Annotation: Linguistic Information from Computer Text Corpora, Longman, London Garside et al. (1997), pp. 151-165.
Macklovitch, Elliott: 1992, 'Where the Tagger Falters', in Quatrième colloque international sur les aspects théoriques et méthodologiques de la traduction automatique, Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Méthodes empiristes versus méthodes rationalistes en TA, Empiricist vs. Rationalist Methods in MT — TMI-92, Montréal, Canada, pp. 113-126.
Marques, Rui: 1994, 'Anotação Contextual do Corpus INESC, 1990' [Contextual annotation of Corpus INESC, 1990], INESC Report, Lisbon.
Medeiros, José Carlos: 1992, 'Ferramentas de processamento de corpora usando o PALAVROSO' [Corpus processing tools using PALAVROSO], in Diana Santos (ed.), Processamento de corpora no INESC, Vol. 1, INESC Report RT-65/92, Lisbon, pp. 29-37.
Medeiros, José Carlos, Rui Marques and Diana Santos: 1993, 'Português Quantitativo' [Quantitative Portuguese], in Actas do 1.o Encontro de Processamento da Língua Portuguesa (Escrita e Falada) — EPLP'93, Lisbon, pp. 33-38.
Mota, Cristina: 1999, 'Enhancing the INTEX Morphological Parser with Lexical Constraints', Lingvisticae Investigationes 12, pp. 413-423.
Nerbonne, John (ed.): 1998. Linguistic Databases, CSLI Publications, Stanford, Calif.
O'Hagan, Minako: 1996, The Coming Industry of Teletranslation, Multilingual Matters Ltd, Clevedon.
Paraboni, Ivandré and Vera Lúcia Strube de Lima: 1998, 'Resolução de referências pronominais possessivas no português escrito' [Resolution of possessive pronominal reference in written Portuguese], in Anais do III Encontro para o Processamento Computacional de Português Escrito e Falado, PROPOR'98, Porto Alegre, Brazil, pp. 48-58.
Pinkham, Jessie: 1996, 'Grammar Sharing Between English and French', Microsoft Research Report MSR-TR-96-15, Redmond, WA.
Rocha, Marco António Esteves da: 1998, A Description of an Annotation Scheme to Analyse Anaphora in Dialogues, Cognitive Science Research Paper 347, University of Sussex, Brighton, England.
Santos, Diana: 1990, 'Lexical Gaps and Idioms in Machine Translation', in COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 2, pp. 330-335.
Santos, Diana: 1993, 'Broad-coverage Machine Translation', in K. Jensen, G. Heidorn and S. Richardson (eds), Natural Language Processing: The PLNLP Approach, Kluwer Academic Publishers, Dordrecht, pp. 101-118.
Santos, Diana: 1994, 'Bilingual Alignment and Tense', in Proceedings of the Second Annual Workshop on Very Large Corpora, Kyoto, Japan, pp. 129-141.
Santos, Diana: 1995, 'L'Imperfeito portugais: étude systématique de ses fonctions et de comment en rendre compte en traduisant vers l'anglais' [Portuguese Imperfeito: a systematic study of its functions and how to render it when translating into English], paper presented at XXIV Colloque sur la linguistique des langues romanes, Palermo, Italy; available at http://www.portugues.mct.pt/Diana/public.html.
Santos, Diana: 1996a, 'Português Computacional' [Computational Portuguese], in Actas do Congresso Internacional sobre o português, 1994, Lisbon, Vol. III, pp. 167-184.
Santos, Diana Maria de Sousa Marques Pinto dos: 1996b, 'Tense and Aspect in English and Portuguese: a Contrastive Semantical [sic] Study', PhD thesis, Instituto Superior Técnico, Technical University of Lisbon.
Santos, Diana: 1996c, 'Para uma classificação aspectual portuguesa do português' [Towards a Portuguese aspectual classification for Portuguese], in Actas do XII Encontro da Associação Portuguesa de Linguística, Braga, Portugal, pp. 299-315.
Santos, Diana: 1997, 'The Importance of Vagueness in Translation: Examples from English to Portuguese', Romansk Forum 5, 43-69.
Santos, Diana: 1998a, 'Punctuation and Multilinguality: Reflections from a Language Engineering Perspective', in Jo Terje Ydstie and Anne C. Wollebak (eds), Working Papers in Applied Linguistics (Department of Linguistics, University of Oslo) 4/98, pp. 138-160.
Santos, Diana: 1998b, 'Perception verbs in English and Portuguese', in Johansson and Oksefjell (1998), pp. 319-342.
Santos, Diana: 1999, 'The Pluperfect in English and Portuguese: What Translation Patterns Show', in Hilde Hasselgaard and Signe Oksefjell (eds), Out of Corpora: Studies in Honour of Stig Johansson, Rodopi, Amsterdam, pp. 283-299.
Santos, Diana: in press, 'Comparação de corpora em português: algumas experiências' [Comparison of Portuguese corpora: some experiments] to appear in Tony Berber Sardinha (ed.), A língua portuguesa no computador, São Paulo.
Santos, Diana: in preparation, Corpus-based Contrastive Semantics, with Special Reference to Tense and Aspect in Portuguese and English, Rodopi, Amsterdam.
Santos, Diana, Carla Fernandes, Rui Marques and José Carlos Medeiros: 1992, 'Gramática sem dicionário: Relatório preliminar' [Grammar without dictionary: Preliminary report], INESC Report RT/15-92, Lisbon.
Santos, Diana and Signe Oksefjell: 1999, 'An Evaluation of the Translation Corpus Aligner with Special Reference to the Language Pair English-Portuguese', in NODALIDA'99, Proceedings from the 12th “Nordisk Datalingvistikkdager”, Trondheim, pp. 191-205.
Schulze, Bruno Maximilian and Oliver Christ: 1996, The CQP User's Manual, Version 1.6, Stuttgart: Institut für Maschinelle Sprachverarbeitung (IMS), Universität Stuttgart.
Silberztein, Max: 1993, Dictionnaires électroniques et analyse automatique de textes: le système INTEX [Electronic dictionaries and automatic analysis of texts: the INTEX system], Masson Ed, Paris.
Simons, Gary F. and John V. Thomson: 1998, 'Multilingual Data Processing in the CELLAR Environment', in Nerbonne (1998), pp. 203-234.
Sinclair, John, Oliver Mason, Jackie Ball and Geoff Barnbrook: 1998, 'Language Independent Statistical Software for Corpus Exploration', Computers and the Humanities 31, 229-255.
Slobin, Dan I.: 1987, 'Thinking for Speaking', in Berkeley Linguistics Society Proceedings of the Thirteenth Annual Meeting: General Session and Parasession on Grammar and Cognition, Berkeley, CA, pp. 435-445.
Slobin, Dan I.: 1997, 'Mind, Code and Text', in Joan Bybee, John Haiman and Sandra A. Thompson (eds), Essays on Language Function and Language Type, Dedicated to T. Givón, John Benjamins, Amsterdam, pp. 437-467.
Slobin, D. I.: 2000, 'Verbalized Events: A Dynamic Approach to Linguistic Relativity and Determinism', in S. Niemeier and R. Dirven (eds), Evidence for Linguistic Relativity, John Benjamins, Amsterdam/Philadelphia, pp. 107-138.
Snell-Hornby, Mary: 1983, Verb-descriptivity in German and English: A Contrastive Study in Semantic Fields, Carl Winter Universitätsverlag, Heidelberg.
Somers, Harold, Jun-ichi Tsujii and Danny Jones: 1990, 'Machine Translation without a Source Text', in COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 3, pp. 271-276.
Stede, Manfred: 1999, Lexical Semantics and Knowledge Representation in Multilingual Text Generation, Kluwer Academic Publishers, Boston.
Steiner, George: 1975, After Babel: Aspects of Language and Translation, Oxford University Press, Oxford.
Talmy, Leonard: 1985, 'Lexicalization patterns: Semantic structure in Lexical Forms', in Timothy Shopen (ed.), Language Typology and Semantic Description, Vol.3: Grammatical Categories and the Lexicon, Cambridge University Press, Cambridge, pp. 57-149.
Tobin, Yishai: 1994, Invariance, Markedness and Distinctive Feature Analysis: A Contrastive Study of Sign Systems in English and Hebrew, John Benjamins, Amsterdam.
Toury, Gideon: 1995, Descriptive Translation Studies and Beyond, John Benjamins, Amsterdam.
Trancoso, Isabel, with the collaboration of Céu Viana: 1995, 'Issues in the Pronunciation of Proper Names', in Proceedings of the Workshop on Integration of Language and Speech, Moscow, pp. 193-209.
Tsujii, Jun-ichi: 1986, 'Future Directions of Machine Translation', in 11th International Conference on Computational Linguistics: Proceedings of Coling '86, Bonn, pp. 655-668.
União Latina: 1998, 'A presença das línguas e das culturas latinas na Internet' [The Internet presence of Latin languages and cultures], União Latina, 28 September 1998, retrieved 5 February 1999 from the World Wide Web, http://www.unilat.org/dtil/lenguainternet/pt/lingua/ lingua_indice.htm.
Vendler, Zeno: 1967, Linguistics in Philosophy, Cornell University Press, Ithaca, NY.
Whitelock, Peter: 1992. 'Shake-and-bake Translation', in Proceedings of the fifteenth [sic] International Conference on Computational Linguistics, Actes du quinzième colloque international en linguistique informatique: COLING-92, Nantes, pp. 784-791.
Yarowsky, David: 1992, 'Word Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora', in Proceedings of the fifteenth [sic] International Conference on Computational Linguistics, Actes du quinzième colloque international en linguistique informatique: COLING-92, Nantes, pp. 454-460.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Santos, D. Toward Language-dependent Applications. Machine Translation 14, 83–112 (1999). https://doi.org/10.1023/A:1008169917741
Issue Date:
DOI: https://doi.org/10.1023/A:1008169917741