Abstract
We combined the mutual information score and TF × IDF score (IR score) in order to select the best keyword translation in our transitive translation. The transitive translation used bilingual dictionaries to translate Indonesian query into Japanese keywords. The Japanese keywords are then used as the input to retrieve Japanese documents. The keyword selection is done in two steps. The first step is to sort translation candidates according to their mutual information scores calculated from a monolingual target language corpus. The second step is to select the best candidate set among 5 top mutual information scores based on their TF × IDF scores. The experiment against NTCIR-3 Web Retrieval Task data shows that the keyword selection based on this combination achieved higher IR score than a direct translation method using original Indonesian-Japanese dictionary and also higher than the machine translation result using Kataku (Indonesian-English) and Babelfish (English-Japanese) engines.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Babelfish English-Japanese Online Machine Translation (April 2004), http://www.altavista.com/babelfish/
Ballesteros, L.A.: Cross-Language Retrieval via Transitive Translation. In: Advances in Information Retrieval, pp. 203–230. Kluwer Academic Publisher, Dordrecht (2000)
Ballesteros, L.A., Bruce Croft, W.: Resolving Ambiguity for Cross-Language Retrieval. In: ACM Sigir (1998)
Chasen (February 2004), http://chasen.naist.jp/hiki/ChaSen/
Eijirou, Alc Co. (2002), http://www.alc.co.jp/
Excite English-Japanese Online Machine Translation (April 2004), http://www.excite.co.jp/world/
Fox, C.: A Stop List for General Text. ACM Sigir 24(2), 19–21 (Fall 1989/Winter 1990)
Fujii, A., Ishikawa, T.: NTCIR-3 Cross-Language IR Experiments at ULIS. In: Proc. Of the Third NTCIR Workshop (2003)
Gao, J., Nie, J.-Y., Xun, E., Zhang, J., Zhou, M., Huang, C.: Improving Query Translation for Cross-Language Information Retrieval using Statistical Model. In: Proc. Sigir (2001)
Gollins, T., Sanderson, M.: Improving Cross Language Information Retrieval with Triangulated Translation. In: Proc. Sigir (2001)
Indonesian-English Online Machine Translation (May 2004), http://www.toggletext.com/kataku_trial.php
Indonesian-Japanese Online Dictionary (May 2004), http://ml.ryu.titech.ac.jp/~indonesia/tokodai/dokumen/kamusjpina.pdf
KEBI, Kamus Elektronik Bahasa Indonesia (February 2004), http://nlp.aia.bppt.go.id/kebi/
Kishida, K., Kando, N.: Two-Stage Refinement of Query Translation in a Pivot Language Approach to Cross-Lingual Information Retrieval: An Experiment at CLEF 2003. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 253–262. Springer, Heidelberg (2004)
Mainichi Shinbun CD-Rom data sets (1993-1995); Nichigai Associates Co. (1994-1996)
Qu, Y., Grefenstette, G., Evans, D.A.: Resolving Translation Ambiguity using Monolingual Corpora. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 223–241. Springer, Heidelberg (2003)
van Rijsbergen, D.J.: Information Retrieval, 2nd edn. Butterworths, London (1979)
WordNet (February 2004), http://wordnet.princeton.edu/
Zu, G., Ohyama, W., Wakabayashi, T., Kimura, F.: Automatic Text Classification Techniques. IEEJ Trans EIS 124(3) (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Purwarianti, A., Tsuchiya, M., Nakagawa, S. (2005). Query Transitive Translation Using IR Score for Indonesian-Japanese CLIR. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_51
Download citation
DOI: https://doi.org/10.1007/11562382_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)