OECM: A Cross-Lingual Approach for Ontology Enrichment

Shimaa Ibrahim^20,21,
Said Fathalla^20,22,
Hamed Shariat Yazdi²⁰,
Jens Lehmann^20,23 &
…
Hajira Jabeen²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11762))

Included in the following conference series:

European Semantic Web Conference

1180 Accesses

Abstract

Due to the rapid expansion of multilingual data on the web, development of approaches to enrich ontologies has become an interesting and active subject of research. In this paper, we propose a cross-lingual matching approach for ontology enrichment (OECM) in order to enrich an ontology using another one in a different natural language. A prototype for the proposed approach has been implemented and evaluated using the MultiFarm benchmark. Evaluation results show higher precision and recall in comparison to other four state-of-the-art approaches.

You have full access to this open access chapter, Download conference paper PDF

State-of-the-Art in Multilingual and Cross-Lingual Ontology Matching

Light-Weight Cross-Lingual Ontology Matching with LYAM++

Multilingual Ontology Matching Evaluation – A First Report on Using MultiFarm

Keywords

1 Introduction

The increasing amount of multilingual data on the Semantic Web has motivated many researchers to develop ontologies in various natural languages. In fact, ontologies can be enriched by adding additional classes and/or relations extracted from other resources, even in another natural language [7]. Such enrichment is a resource demanding and time-consuming task. Therefore, automated or semi-automated ontology enrichment approaches are highly desired. Most research efforts pay attention to enrich English ontologies from English resources rather than non-English ones, by applying ontology matching techniques [7]. This raises a key question; How can an ontology be enriched using another ontology in a different natural language? In order to enrich ontologies from multilingual resources, most of the recent efforts in developing different techniques for cross-lingual ontology matching focus on one-to-one translation between ontology concepts [8]. Consequently, inappropriate translations negatively affect the quality of the matching process [8]. Therefore it is important to develop innovative approaches which are capable of enriching ontologies by selecting the best translation among all available translations (i.e. one-to-many translations) for a particular term. To the best of our knowledge, only our previous work [1] has addressed the problem of enriching ontologies from multilingual text. In this paper, we propose a new approach (OECM) to enrich an ontology, i.e. the target ontology T, using another one, i.e. the source ontology S, in a different natural language. The prominent feature of the proposed approach is the selection of the best translation between all available translations when matching classes among ontologies. This selection significantly improves the quality of the matching process. Furthermore, the usage of ontologies as the source for the enrichment process can significantly reduce the cost of data pre-processing and parsing of the data used being used for the enrichment. To evaluate OECM, we compare the cross-lingual ontology matching process with four state-of-the-art approaches. The implementation of OECM and the datasets used for evaluation are publicly available at https://github.com/shmkhaled/OECM.

2 The Proposed Approach

Goal: Given two ontologies S and T, in two different natural languages \(L_1\) and \(L_2\) respectively, as RDF triples \(\langle s, p, o \rangle \in \mathcal {C} \times \mathcal {R} \times (\mathcal {C} \cup \mathcal {L})\) where \(\mathcal {C}\) is the set of ontology domain entities (i.e. classes), \(\mathcal {R}\) is the set of relations, and \(\mathcal {L}\) is the set of literals. We aim at finding the complementary information \(\mathcal {T}_{e} = S - (S \cap T)\) from S in order to enrich T. The methodology of the proposed approach comprises three phases: (1) pre-matching, (2) matching, and (3) enrichment. We have considered only class labels, or local names, and three standard relations: rdfs:subClassOf, owl:equivalentClass, owl:disjointWith.

(1) Pre-matching: T and S are prepared before starting the matching phase by performing two tasks: (a) Pre-processing: The aim of this task is to prepare the local names and/or labels of classes of S and T by employing several natural language processing techniques, such as tokenization, normalization, stop words removal and POS-tagging. The output of this task is two sets of pre-processed classes \(\mathcal {C}'_S\) and \(\mathcal {C}'_T\) for S and T respectively, (b) Translation: Each class in \(\mathcal {C}'_S\) is translated using Google Translator to the language of T (i.e. \(L_2\)). A list of translations is associated with each class, for example, the class label “Thema” in German, has a list of two English translations: “Subject” and “Topic”. The best translation will be selected in the next phase.

(2) Matching: In order to identify which, and where the new information will be added to T, potential matches between S and T should be identified. We use two types of matching: Terminological matching and Structural matching. (a) Terminological matching: This task is used to identify which information can be added to T. In order to choose the best translation for each class that matches the corresponding one in T, we perform a pairwise string matching between them. We chose Jaccard similarity as a string similarity metric because it has achieved the best score in terms of precision in the experiments conducted for the ontology alignment task in the MultiFarm benchmark^{Footnote 1} [2]. We consider similarity scores greater than or equal to a specific threshold \(\theta \) to get the best matches. After running the experiments for ten times, we obtained the best value of \(\theta \) which gives the best matching results. If no match is found, this class is considered as a new class, which is added to T. At the end, matched classes are validated by experts in order to confirm that the best translation is selected for each class. (b) Structural matching: It is used to identify where the new information can be added to T. Each class in S is replaced by its best translation found in the previous matching in order to get a translated ontology \(S_{trans}\). We apply a pairwise triple comparison between \(S_{trans}\) and T to get the set of triples to be enriched \(\mathcal {T}_{e}\), which is represented by \(\langle s, p, o, F \rangle \). Each triple is associated with a flag F, with a value either ‘E’ for enrichment or ‘A’ for addition. For a particular triple, if \(s\in \mathcal {C}'_T\) and \(o \not \in \mathcal {C}'_T\), then F = ‘E’, i.e. this triple is needed to enrich the existing information in T, while if \(s\not \in \mathcal {C}'_T\) and \(o \in \mathcal {C}'_T\), then F = ‘A’, i.e. this triple is needed to add a new class to T.

(3) Enrichment: \(\mathcal {T}_{e}\) is used to enrich T according to the flags associated with each triple. We enrich the Scientific Events Ontology (SEO\(_{en}\)) [4], which has 49 classes in English, using the Conference\(_{de}\) ontology from the MultiFarm dataset (see Sect. 3), which has 60 classes in German. OECM has identified new 15 triples to enrich SEO\(_{en}\). For instance, is used to add a new class ConferenceContributor, as a subClassOf Person, to SEO\(_{en}\). In addition, is used to enrich SEO\(_{en}\) with additional information, i.e. adding a new relation subClassOf between the two classes. The complete 15 triples can be found at the GitHub repository. We have successfully enriched SEO\(_{en}\) by 93.75% of the triples identified by an expert.

Table 1. State-of-the-art comparison results

Full size table

3 Evaluation

We use ontologies in the MultiFarm benchmark to measure the quality of the cross-lingual matching process. MultiFarm consists of seven ontologies, their translation into nine languages, and the corresponding cross-lingual alignments between them (i.e. the gold standard). We compare our results with four state-of-the-art approaches (see Table 1) for matching Conference\(_{de}\) with Ekaw\(_{en}\) and Conference\(_{de}\) with Edas\(_{en}\) ontologies. OECM outperforms all other systems in terms of precision, recall, and F-measure. For AML [3], authors include pre-computed dictionaries with translations, to overcome the query limit of Microsoft Translator which decrease the efficiency of their approach. LogMap [5] depends mainly on the initial mappings to discover new mappings, which decreased after performing the translation. XMap [9] did not achieve satisfactory results because of many internal exceptions. Surprisingly, we found seven new alignments, which did not exist in the gold standard, when matching Conference\(_{de}\) with Ekaw\(_{en}\), for instance, .

4 Conclusion

We present a new approach (OECM) for enrichment of ontologies using other ontologies in different natural languages. Terminological and structural matching have been used in order to identify which, and where, information from the source ontology, can be used to enrich the target ontology. We consider all available translations for each term and select the best translation that matches the corresponding term in the target ontology. Such selection has significantly improved the quality of the matching process. It is worth to mentioning that OECM has also found new alignments, which were missing in the gold standard. OECM outperforms most of the state of the art systems in terms of precision, recall, and F-measure. We are in the process of investigating the usage of semantic similarity between terms in the matching process, in addition to considering other non-standard semantic relations and individuals in the enrichment process.

Notes

1.
https://www.irit.fr/recherches/MELODI/multifarm/.

References

Ali, M., Fathalla, S., Ibrahim, S., Kholief, M., Hassan, Y.: Cross-lingual ontology enrichment based on multi-agent architecture. Proc. Comput. Sci. 137, 127–138 (2018)
Article Google Scholar
Cheatham, M., Hitzler, P.: String similarity metrics for ontology alignment. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 294–309. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_19
Chapter Google Scholar
Faria, D., et al.: Results of AML participation in OAEI 2018. In: Proceedings of the 13th International Workshop on Ontology Matching, pp. 125–131. CEUR-WS (2018)
Google Scholar
Fathalla, S., Vahdati, S., Auer, S., Lange, C.: The scientific events ontology of the openresearch.org curation platform. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 2311–2313. ACM (2019)
Google Scholar
Jiménez-Ruiz, E., Grau, V.C.: LogMap family participation in the OAEI 2018. In: Proceedings of the 13th International Workshop on Ontology Matching, pp. 187–191. CEUR-WS (2018)
Google Scholar
Kachroudi, M., Diallo, G., Yahia, S.B.: OAEI 2018 results of KEPLER. In: Proceedings of the 13th International Workshop on Ontology Matching, pp. 173–178. CEUR-WS (2018)
Google Scholar
Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavitsanos, E.: Ontology population and enrichment: state of the art. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. LNCS (LNAI), vol. 6050, pp. 134–166. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20795-2_6
Chapter Google Scholar
Trojahn, C., Fu, B., Zamazal, O., Ritze, D.: State-of-the-art in multilingual and cross-lingual ontology matching. In: Buitelaar, P., Cimiano, P. (eds.) Towards the Multilingual Semantic Web, pp. 119–135. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43585-4_8
Chapter Google Scholar
Djeddi, W.E., Yahia, S.B., Khadir, M.T.: XMap results for OAEI 2018. In: Proceedings of the 13th International Workshop on Ontology Matching, pp. 210–215. CEUR-WS (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Smart Data Analytics (SDA), University of Bonn, Bonn, Germany
Shimaa Ibrahim, Said Fathalla, Hamed Shariat Yazdi, Jens Lehmann & Hajira Jabeen
Institute of Graduate Studies and Research, University of Alexandria, Alexandria, Egypt
Shimaa Ibrahim
Faculty of Science, University of Alexandria, Alexandria, Egypt
Said Fathalla
Enterprise Information Systems Department, Fraunhofer IAIS, Sankt Augustin, Germany
Jens Lehmann

Authors

Shimaa Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
Said Fathalla
View author publications
You can also search for this author in PubMed Google Scholar
Hamed Shariat Yazdi
View author publications
You can also search for this author in PubMed Google Scholar
Jens Lehmann
View author publications
You can also search for this author in PubMed Google Scholar
Hajira Jabeen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shimaa Ibrahim .

Editor information

Editors and Affiliations

Kansas State University, Manhattan, KS, USA
Pascal Hitzler
Vienna University of Economics and Business, Vienna, Austria
Sabrina Kirrane
Linköping University, Linköping, Sweden
Olaf Hartig
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Victor de Boer
Leibniz Information Centre for Science and Technology University Library (TIB), Hannover, Germany
Maria-Esther Vidal
University of Bonn, Bonn, Germany
Maria Maleshkova
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Stefan Schlobach
Jönköping University, Jönköping, Sweden
Karl Hammar
F. Hoffmann-La Roche AG, Basel, Switzerland
Nelia Lasierra
Robert Bosch GmbH, Stuttgart, Germany
Steffen Stadtmüller
Aalborg University, Aalborg, Denmark
Katja Hose
IMEC, Ghent University, Ghent, Belgium
Ruben Verborgh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ibrahim, S., Fathalla, S., Yazdi, H.S., Lehmann, J., Jabeen, H. (2019). OECM: A Cross-Lingual Approach for Ontology Enrichment. In: Hitzler, P., et al. The Semantic Web: ESWC 2019 Satellite Events. ESWC 2019. Lecture Notes in Computer Science(), vol 11762. Springer, Cham. https://doi.org/10.1007/978-3-030-32327-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-32327-1_20
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32326-4
Online ISBN: 978-3-030-32327-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

OECM: A Cross-Lingual Approach for Ontology Enrichment

Abstract

Similar content being viewed by others

State-of-the-Art in Multilingual and Cross-Lingual Ontology Matching

Light-Weight Cross-Lingual Ontology Matching with LYAM++

Multilingual Ontology Matching Evaluation – A First Report on Using MultiFarm

Keywords

1 Introduction

2 The Proposed Approach

3 Evaluation

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

OECM: A Cross-Lingual Approach for Ontology Enrichment

Abstract

Similar content being viewed by others

State-of-the-Art in Multilingual and Cross-Lingual Ontology Matching

Light-Weight Cross-Lingual Ontology Matching with LYAM++

Multilingual Ontology Matching Evaluation – A First Report on Using MultiFarm

Keywords

1 Introduction

2 The Proposed Approach

3 Evaluation

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation