Abstract
In an increasingly multilingual world, it is critical that information management tools organically support the simultaneous use of multiple natural languages. A pre-requisite for efficiently achieving this goal is that the underlying database engines must provide seamless matching of text data across languages. We propose here SemEQUAL, a new SQL functionality for semantic matching of multilingual attribute data. Our current implementation defines matches based on the standard WordNet linguistic ontologies. A performance evaluation of SemEQUAL, implemented using standard SQL:1999 features on a suite of commercial database systems indicates unacceptably slow response times. However, by tuning the schema and index choices to match typical linguistic features, we show that the performance can be improved to a level commensurate with online user interaction.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
The British National Corpus, Oxford University Press, http://www.comp.lancs.ac.uk
Centre for Indian Language Technology, IIT-Bombay, http://www.cfilt.iitb.ac.in
Chen, H., Lin, C., Lin, W.: Building a Chinese-English WordNet for Translingual Applications. ACM Transactions on Asian Languages Information Processing (2002)
Deerwester, S., Dumais, S.T., Ogden, W.C.: Indexing by Latent Semantic Analysis. Jour. of American Soc. of Information Sciences ( September 1990)
The EuroSpider, http://www.eurospider.ch
The Euro-WordNet, http://www.illc.uva.nl/EuroWordNet
Fellbaum, C., Miller, G.A.: WordNet: An electronic lexical database (language, speech and communication). MIT Press, Cambridge (1998)
Fluhr, C., et al.: Multilingual Database and Crosslingual Interrogation in a Real Internet Application. In: AAAI Sym. on Crosslanguage Text and Speech Retrieval (1997)
Gey, F., Chen, A., Buckland, M., Larson, R.: Translingual Vocabulary Mapping for Multilingual Information Access. In: Proc. of 25th ACM SIGIR Conf. (2002)
The Global WordNet Association, http://www.globalwordnet.org
Han, J., et al.: Some Performance Results on Recursive Query Processing in Relational Database Systems. In: Proc. of 2nd ICDE Conf. (1986)
Ioannidis, Y.: On the Computation of TC of Relational Operators. In: Proc. of 12th VLDB Conf. (1986)
Jayaram, B.D., Bhattacharyya, P.: Report on Indo-WordNet Workshop. Central Institute of Indian Languages (January 1999)
Kumaran, A., Haritsa, J.R.: On Multilingual Performance of Database Systems. In: Proc. of 29th VLDB Conf. (2003)
Kumaran, A., Haritsa, J.R.: Supporting Multiscript Matching in Database Systems. In: Prof. of 9th EDBT Conf. (2004)
Kumaran, A., Haritsa, J.R.: Multilingual Semantic Operator in SQL. Technical Report TR-2004-03, DSL/SERC, Indian Institute of Science (2004)
Liberman, M., Church, K.: Text Analysis and Word Pronunciation in TTS Synthesis. Advances in Speech Processing (1992)
The Computer Scope Ltd., http://www.NUA.ie/Surveys
Richardson, R., Smeaton, A.F.: Using WordNet in a Knowledge-based Approach to Information Retrieval. Working Paper CA-0395, Dublin City University (1999)
Soergel, D.: Multilingual thesauri in cross-language text and speech retrieval. In: AAAI Sym. on Cross-Language Text and Speech Retrieval (March 1997)
The Semantic Web, http://www.w3.org/2001/sw
The WebFountain, http://www.almaden.ibm.com/WebFountain
The WordNet, http://www.cogsci.princeton.edu/~wn
Word Discover, http://www.worddiscover.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumaran, A., Haritsa, J.R. (2005). SemEQUAL: Multilingual Semantic Matching in Relational Systems. In: Zhou, L., Ooi, B.C., Meng, X. (eds) Database Systems for Advanced Applications. DASFAA 2005. Lecture Notes in Computer Science, vol 3453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11408079_20
Download citation
DOI: https://doi.org/10.1007/11408079_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25334-1
Online ISBN: 978-3-540-32005-0
eBook Packages: Computer ScienceComputer Science (R0)