[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

CDI: Configurable Data Integration Using Property Precedence Relations

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

Generally, data integration is performed through schema mapping representing high level correlation between heterogeneous data sources. Such mappings are generated using direct correspondences between data elements of source and target schemas, while other semantic relations are neglected. In this paper, we first use hierarchical relationships among properties (property precedence) as fundamental semantic relations within source and target schemas to semantically enhance schema mappings. Then, we use global property precedence relations between source and target elements to achieve Configurable Data Integration (CDI). This configurable setting allows trade-off between accuracy and completeness in query answering. Experiments using a working prototype of CDI show the potential of using this approach in various data integration scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://dblp.org/.

  2. https://europepmc.org/.

  3. https://www.drugbank.ca/.

  4. http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/.

References

  1. Ceravolo P, Azzini A, Angelini M, Catarci T, Cudr-Mauroux P, Damiani E (2018) Big data semantics. J Data Semant 7(2):65–85

    Article  Google Scholar 

  2. Maass W, Parsons J, Purao S, Storey VC, Woo C (2018) Data-driven meets theory-driven research in the era of big data: opportunities and challenges for information systems research. J Assoc Inf Syst 19(12):1253–1273

    Google Scholar 

  3. Sekhavat YA, Parsons J (2016) SEDEX: scalable entity preserving data exchange. IEEE Trans Knowl Data Eng 28(7):1878–1890

    Article  Google Scholar 

  4. Bonifati A, Mecca G, Papotti P, Velegrakis Y (2011) Discovery and correctness of schema mapping transformations, schema matching and mapping. Springer, Berlin, pp 111–147

    Book  Google Scholar 

  5. Parsons J, Wand Y (2003) Attribute-based semantic reconciliation of multiple data sources. J Data Semant 2800:21–47

    Article  Google Scholar 

  6. Bunge M (1977) Treatise on basic philosophy: ontology I: the furniture of the world. Reidel, Boston

    Book  MATH  Google Scholar 

  7. Popa L, Velegrakis Y, Hernndez MA, Miller RJ, Fagin R (2002) Translating Web data. In: Proceedings of the 28th international conference on very large data bases. Hong Kong, China, pp 598–609

  8. Miller RJ, Haas LM, Hernndez MA (2000) Schema mapping as query discovery. In: Proceedings of the 26th international conference on very large data bases, pp 77–88

  9. Fuxman A, Hernandez MA, Ho H, Miller RJ, Papotti P, Popa L (2006) Nested mappings: schema mapping reloaded. In: Proceedings of the 32nd international conference on very large data bases. Seoul, Korea, pp 67–78

  10. Popa L, Tannen V (1999) An equational chase for path-conjunctive queries, constraints, and views. In: Proceedings of the 7th international conference on database theory. Jerusalem, Israel, pp 39–57

  11. Halevy A, Rajaraman A, Ordille J (2006) Data integration: the teenage years. In: Proceedings of the 32nd international conference on very large data bases, pp 9–16

  12. Yu C, Popa L (2004) Constraint-based XML query rewriting for data integration. In: Presented at proceedings of the ACM SIGMOD international conference on management of data. Paris, France, pp 371–382

  13. Sekhavat YA, Parsons J (2012) Semantic schema mapping using property precedence relations. In: Proceedings of the IEEE sixth international conference on semantic computing, pp 210–217

  14. Chekol MW, Euzenat J, Genevs P, Layada N (2018) SPARQL query containment under schema. J Data Semant, 1–22

  15. Sekhavat YA (2012) Semantic heterogeneity reconciliation in data integration. In: Proceedings of the PhD Workshop of 38th Conference on Very Large Data Bases, pp 19–24

  16. Marnette B, Mecca G, Papotti P, Raunich S, Santoro D (2011) ++Spicy: an open-source tool for second-generation schema mapping and data exchange. Proc VLDB Endow 4:1438–1441

    Google Scholar 

  17. Wand Y, Storey VC, Weber R (1999) An ontological analysis of the relationship construct in conceptual modeling. ACM Trans Data Syst 24:494–528

    Article  Google Scholar 

  18. Parsons J, Wand Y (2000) Emancipating instances from the tyranny of classes in information modeling. ACM Trans Data Syst 25:228–268

    Article  Google Scholar 

  19. Gemino A, Wand Y (2004) A framework for empirical evaluation of conceptual modeling techniques. Req Eng 9:248–260

    Article  Google Scholar 

  20. Parsons J (2011) An experimental study of the effects of representing property precedence on the comprehension of conceptual schemas. J Assoc Inf Syst 12:441–462

    Google Scholar 

  21. Wand Y, Weber R (1990) Mario bunge’s ontology as a formal foundation for information systems concepts. Studies on Mario Bunge’s treatise. Rodopi, Atlanta, pp 123–149

    Google Scholar 

  22. Parsons J, Wand Y (2008) Using cognitive principles to guide classification in information systems modeling. MIS Q 32:839–868

    Article  Google Scholar 

  23. Parsons J, Chen T (2008) Using property precedence to enhance the effectiveness of queries on unstructured data, In: Proceedings of 18th workshop on information technology systems. Paris, France, pp 73–78

  24. Alexe B, Tan W, Velegrakis Y (2008) STBenchmark: towards a benchmark for mapping systems. Proc VLDB Endow 1:230–244

    Article  Google Scholar 

  25. Köpcke H, Rahm E (2010) Frameworks for entity matching: a comparison. Data Knowl Eng 69:197–210

    Article  Google Scholar 

  26. Schmidt M, Hornung T, Lausen G, Pinkel C (2009) SP2Bench: a SPARQL performance benchmark. In: IEEE 25th international conference on data engineering, ICDE’09, pp. 222–233

  27. Hai R, Quix C, Zhou C (2018, September). Query rewriting for heterogeneous data lakes. In: European conference on advances in databases and information systems. Springer, Berlin, pp 35–49

  28. Leis V, Gubichev A, Mirchev A, Boncz P, Kemper A, Neumann T (2015) How good are query optimizers, really? Proc VLDB Endow 9(3):204–215

    Article  Google Scholar 

  29. Halevy A (2010) Technical perspective schema mappings: rules for mixing data. Commun ACM 53:100

    Article  Google Scholar 

  30. Jiang H, Ho H, Popa L, Han W (2007) Mapping-driven XML transformation. In: Proceedings of the 16th international conference on WWW. Banff, Alberta, Canada, pp 1063–1072

  31. Chiticariu L, Kolaitis PG, Popa L (2008) Interactive generation of integrated schemas. In: Proceedings of the ACM SIGMOD. Vancouver, Canada, pp 833–846

  32. Fletcher GH, Wyss CM (2006) Data mapping as search. Adv Data Tech 3896:95–111

    Google Scholar 

  33. Alexe B, Hernndez M, Popa L, Tan W (2010) MapMerge: correlating independent schema mappings. Proc VLDB Endow 3:81–92

    Article  Google Scholar 

  34. Mena E, Illarramendi A, Kashyap V, Sheth AP (2000) OBSERVER: an approach for query processing in global information systems based on interoperation across pre-existing ontologies. Distrib Parallel Databases 8:223–271

    Article  Google Scholar 

  35. An Y, Song I (2008) Discovering semantically similar associations (SeSA) for complex mappings between conceptual models. In: Proceedings of the 27th international conference on conceptual modeling. Barcelona, Spain, pp 369–382

  36. Hassanzadeh O, Kementsietsidis A, Lim L, Miller RJ, Wang M (2009) A framework for semantic link discovery over relational data. In: Proceedings of the 18th ACM conference on information and knowledge management. Hong Kong, China,pp 1027–1036

  37. Haas L, Hentschel M, Kossmann D, Miller R (2009) Schema AND data: A holistic approach to mapping, resolution and fusion in information integration. In: Proceedings of the 28th international conference on conceptual modeling. Gramado, Brazil, pp 27–40

  38. Kementsietsidis A, Arenas M, Miller RJ (2003) Mapping data in peer-to-peer systems: semantics and algorithmic issues. In: Proceedings of the ACM SIGMOD international conference on management of data. San Diego, CA, USA, pp 325–336

  39. ten Cate B, Chiticariu L, Kolaitis P, Tan W (2009) Laconic schema mappings: computing the core with SQL queries. Proc VLDB Endow 2:1006–1017

    Article  Google Scholar 

  40. Cal A, Gottlob G, Lukasiewicz T (2009) Datalog\(\pm \): A unified approach to ontologies and integrity constraints. In: Proceedings of the 12th international conference on database theory. St. Petersburg, Russia, pp 14–30

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoones A. Sekhavat.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Examples of simple precedence, compound precedence, and co-precedence relations extracted for Amalgam dataset:

\(\hbox {SPP}_{1}\)

\(\textit{m}_{1}:= \langle {\textit{loc}}, ``{\textit{USA, WA}}\hbox {''}\rangle \rightarrow \textit{m}_{2}:= \langle {\textit{loc}}, ``{\textit{UnitedStates}}\hbox {''}\rangle \)

\(\hbox {SPP}_{2}\)

\(\textit{m}_{1}:= \langle {\textit{Descriptor}}, ``{\textit{queryProcessing}}\hbox {''}\rangle \rightarrow \textit{m}_{2}:= \langle {\textit{class}}, ``{\textit{database}}\hbox {''}\rangle \)

\(\hbox {SPP}_{3}\)

\(\textit{m}_{1}:= \langle {\textit{publisher}}, ``{\textit{ACM}}\hbox {''}\rangle \rightarrow \textit{m}_{2}:= \langle {\textit{Language}}, ``{\textit{english}}\hbox {''}\rangle . \)

\(\hbox {COPP}_{1}\)

\(\textit{m}_{1}:= \langle {\textit{loc}}, \textit{v}_{i}\rangle \leftrightarrows \textit{m}_{2}:= \langle {\textit{countryofOrigin}}, \textit{v}_{i}\rangle \)

\(\hbox {COPP}_{2}\)

\(\textit{m}_{1}:= \langle {\textit{type}}, ``{\textit{techRep}}\hbox {''}\rangle \leftrightarrows \textit{m}_{2}:= \langle {\textit{type}}, ``{\textit{technicalReport}}\hbox {''}\rangle \)

\(\hbox {CPP}_{1}\)

\([\textit{m}_{1}:= \langle {\textit{countryOfPublication}}, ``v_{i}\hbox {''}\rangle ,~ \textit{m}_{1}{} \textit{(}x_{1}{} \textit{)}]~{\mathop {\longrightarrow }\limits ^{{\textit{located}}(x_{1} ,x_{2} )}} [\textit{m}_{2}~\langle {\textit{location}}, v_{2}\rangle ,~ \textit{m}_{2}{} \textit{(}x_{2}{} \textit{)}]\)

\(\hbox {CPP}_{2}\)

\([\textit{m}_{1} := \langle {\textit{Source}}, ``{\textit{IEEE Int. Conf. Data Eng}}\hbox {''} \rangle , \textit{m}_{1} {\textit{(x}}_{1}{} \textit{)}] {\mathop {\longrightarrow }\limits ^{{{ Recorded}}(x_{1} ,x_{2} )}} [\textit{m}_{2} := \langle {\textit{classificationCategory}}, ``{\textit{database}}\hbox {''} \rangle , \textit{m}_{2} \textit{(x}_{2} \textit{)}]\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sekhavat, Y.A., Parsons, J. CDI: Configurable Data Integration Using Property Precedence Relations. J Data Semant 8, 1–19 (2019). https://doi.org/10.1007/s13740-019-00101-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-019-00101-7

Keywords