Abstract
Generally, data integration is performed through schema mapping representing high level correlation between heterogeneous data sources. Such mappings are generated using direct correspondences between data elements of source and target schemas, while other semantic relations are neglected. In this paper, we first use hierarchical relationships among properties (property precedence) as fundamental semantic relations within source and target schemas to semantically enhance schema mappings. Then, we use global property precedence relations between source and target elements to achieve Configurable Data Integration (CDI). This configurable setting allows trade-off between accuracy and completeness in query answering. Experiments using a working prototype of CDI show the potential of using this approach in various data integration scenarios.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ceravolo P, Azzini A, Angelini M, Catarci T, Cudr-Mauroux P, Damiani E (2018) Big data semantics. J Data Semant 7(2):65–85
Maass W, Parsons J, Purao S, Storey VC, Woo C (2018) Data-driven meets theory-driven research in the era of big data: opportunities and challenges for information systems research. J Assoc Inf Syst 19(12):1253–1273
Sekhavat YA, Parsons J (2016) SEDEX: scalable entity preserving data exchange. IEEE Trans Knowl Data Eng 28(7):1878–1890
Bonifati A, Mecca G, Papotti P, Velegrakis Y (2011) Discovery and correctness of schema mapping transformations, schema matching and mapping. Springer, Berlin, pp 111–147
Parsons J, Wand Y (2003) Attribute-based semantic reconciliation of multiple data sources. J Data Semant 2800:21–47
Bunge M (1977) Treatise on basic philosophy: ontology I: the furniture of the world. Reidel, Boston
Popa L, Velegrakis Y, Hernndez MA, Miller RJ, Fagin R (2002) Translating Web data. In: Proceedings of the 28th international conference on very large data bases. Hong Kong, China, pp 598–609
Miller RJ, Haas LM, Hernndez MA (2000) Schema mapping as query discovery. In: Proceedings of the 26th international conference on very large data bases, pp 77–88
Fuxman A, Hernandez MA, Ho H, Miller RJ, Papotti P, Popa L (2006) Nested mappings: schema mapping reloaded. In: Proceedings of the 32nd international conference on very large data bases. Seoul, Korea, pp 67–78
Popa L, Tannen V (1999) An equational chase for path-conjunctive queries, constraints, and views. In: Proceedings of the 7th international conference on database theory. Jerusalem, Israel, pp 39–57
Halevy A, Rajaraman A, Ordille J (2006) Data integration: the teenage years. In: Proceedings of the 32nd international conference on very large data bases, pp 9–16
Yu C, Popa L (2004) Constraint-based XML query rewriting for data integration. In: Presented at proceedings of the ACM SIGMOD international conference on management of data. Paris, France, pp 371–382
Sekhavat YA, Parsons J (2012) Semantic schema mapping using property precedence relations. In: Proceedings of the IEEE sixth international conference on semantic computing, pp 210–217
Chekol MW, Euzenat J, Genevs P, Layada N (2018) SPARQL query containment under schema. J Data Semant, 1–22
Sekhavat YA (2012) Semantic heterogeneity reconciliation in data integration. In: Proceedings of the PhD Workshop of 38th Conference on Very Large Data Bases, pp 19–24
Marnette B, Mecca G, Papotti P, Raunich S, Santoro D (2011) ++Spicy: an open-source tool for second-generation schema mapping and data exchange. Proc VLDB Endow 4:1438–1441
Wand Y, Storey VC, Weber R (1999) An ontological analysis of the relationship construct in conceptual modeling. ACM Trans Data Syst 24:494–528
Parsons J, Wand Y (2000) Emancipating instances from the tyranny of classes in information modeling. ACM Trans Data Syst 25:228–268
Gemino A, Wand Y (2004) A framework for empirical evaluation of conceptual modeling techniques. Req Eng 9:248–260
Parsons J (2011) An experimental study of the effects of representing property precedence on the comprehension of conceptual schemas. J Assoc Inf Syst 12:441–462
Wand Y, Weber R (1990) Mario bunge’s ontology as a formal foundation for information systems concepts. Studies on Mario Bunge’s treatise. Rodopi, Atlanta, pp 123–149
Parsons J, Wand Y (2008) Using cognitive principles to guide classification in information systems modeling. MIS Q 32:839–868
Parsons J, Chen T (2008) Using property precedence to enhance the effectiveness of queries on unstructured data, In: Proceedings of 18th workshop on information technology systems. Paris, France, pp 73–78
Alexe B, Tan W, Velegrakis Y (2008) STBenchmark: towards a benchmark for mapping systems. Proc VLDB Endow 1:230–244
Köpcke H, Rahm E (2010) Frameworks for entity matching: a comparison. Data Knowl Eng 69:197–210
Schmidt M, Hornung T, Lausen G, Pinkel C (2009) SP2Bench: a SPARQL performance benchmark. In: IEEE 25th international conference on data engineering, ICDE’09, pp. 222–233
Hai R, Quix C, Zhou C (2018, September). Query rewriting for heterogeneous data lakes. In: European conference on advances in databases and information systems. Springer, Berlin, pp 35–49
Leis V, Gubichev A, Mirchev A, Boncz P, Kemper A, Neumann T (2015) How good are query optimizers, really? Proc VLDB Endow 9(3):204–215
Halevy A (2010) Technical perspective schema mappings: rules for mixing data. Commun ACM 53:100
Jiang H, Ho H, Popa L, Han W (2007) Mapping-driven XML transformation. In: Proceedings of the 16th international conference on WWW. Banff, Alberta, Canada, pp 1063–1072
Chiticariu L, Kolaitis PG, Popa L (2008) Interactive generation of integrated schemas. In: Proceedings of the ACM SIGMOD. Vancouver, Canada, pp 833–846
Fletcher GH, Wyss CM (2006) Data mapping as search. Adv Data Tech 3896:95–111
Alexe B, Hernndez M, Popa L, Tan W (2010) MapMerge: correlating independent schema mappings. Proc VLDB Endow 3:81–92
Mena E, Illarramendi A, Kashyap V, Sheth AP (2000) OBSERVER: an approach for query processing in global information systems based on interoperation across pre-existing ontologies. Distrib Parallel Databases 8:223–271
An Y, Song I (2008) Discovering semantically similar associations (SeSA) for complex mappings between conceptual models. In: Proceedings of the 27th international conference on conceptual modeling. Barcelona, Spain, pp 369–382
Hassanzadeh O, Kementsietsidis A, Lim L, Miller RJ, Wang M (2009) A framework for semantic link discovery over relational data. In: Proceedings of the 18th ACM conference on information and knowledge management. Hong Kong, China,pp 1027–1036
Haas L, Hentschel M, Kossmann D, Miller R (2009) Schema AND data: A holistic approach to mapping, resolution and fusion in information integration. In: Proceedings of the 28th international conference on conceptual modeling. Gramado, Brazil, pp 27–40
Kementsietsidis A, Arenas M, Miller RJ (2003) Mapping data in peer-to-peer systems: semantics and algorithmic issues. In: Proceedings of the ACM SIGMOD international conference on management of data. San Diego, CA, USA, pp 325–336
ten Cate B, Chiticariu L, Kolaitis P, Tan W (2009) Laconic schema mappings: computing the core with SQL queries. Proc VLDB Endow 2:1006–1017
Cal A, Gottlob G, Lukasiewicz T (2009) Datalog\(\pm \): A unified approach to ontologies and integrity constraints. In: Proceedings of the 12th international conference on database theory. St. Petersburg, Russia, pp 14–30
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
Examples of simple precedence, compound precedence, and co-precedence relations extracted for Amalgam dataset:
\(\hbox {SPP}_{1}\) | \(\textit{m}_{1}:= \langle {\textit{loc}}, ``{\textit{USA, WA}}\hbox {''}\rangle \rightarrow \textit{m}_{2}:= \langle {\textit{loc}}, ``{\textit{UnitedStates}}\hbox {''}\rangle \) |
\(\hbox {SPP}_{2}\) | \(\textit{m}_{1}:= \langle {\textit{Descriptor}}, ``{\textit{queryProcessing}}\hbox {''}\rangle \rightarrow \textit{m}_{2}:= \langle {\textit{class}}, ``{\textit{database}}\hbox {''}\rangle \) |
\(\hbox {SPP}_{3}\) | \(\textit{m}_{1}:= \langle {\textit{publisher}}, ``{\textit{ACM}}\hbox {''}\rangle \rightarrow \textit{m}_{2}:= \langle {\textit{Language}}, ``{\textit{english}}\hbox {''}\rangle . \) |
\(\hbox {COPP}_{1}\) | \(\textit{m}_{1}:= \langle {\textit{loc}}, \textit{v}_{i}\rangle \leftrightarrows \textit{m}_{2}:= \langle {\textit{countryofOrigin}}, \textit{v}_{i}\rangle \) |
\(\hbox {COPP}_{2}\) | \(\textit{m}_{1}:= \langle {\textit{type}}, ``{\textit{techRep}}\hbox {''}\rangle \leftrightarrows \textit{m}_{2}:= \langle {\textit{type}}, ``{\textit{technicalReport}}\hbox {''}\rangle \) |
\(\hbox {CPP}_{1}\) | \([\textit{m}_{1}:= \langle {\textit{countryOfPublication}}, ``v_{i}\hbox {''}\rangle ,~ \textit{m}_{1}{} \textit{(}x_{1}{} \textit{)}]~{\mathop {\longrightarrow }\limits ^{{\textit{located}}(x_{1} ,x_{2} )}} [\textit{m}_{2}~\langle {\textit{location}}, v_{2}\rangle ,~ \textit{m}_{2}{} \textit{(}x_{2}{} \textit{)}]\) |
\(\hbox {CPP}_{2}\) | \([\textit{m}_{1} := \langle {\textit{Source}}, ``{\textit{IEEE Int. Conf. Data Eng}}\hbox {''} \rangle , \textit{m}_{1} {\textit{(x}}_{1}{} \textit{)}] {\mathop {\longrightarrow }\limits ^{{{ Recorded}}(x_{1} ,x_{2} )}} [\textit{m}_{2} := \langle {\textit{classificationCategory}}, ``{\textit{database}}\hbox {''} \rangle , \textit{m}_{2} \textit{(x}_{2} \textit{)}]\) |
Rights and permissions
About this article
Cite this article
Sekhavat, Y.A., Parsons, J. CDI: Configurable Data Integration Using Property Precedence Relations. J Data Semant 8, 1–19 (2019). https://doi.org/10.1007/s13740-019-00101-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-019-00101-7